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LISP constructs such as AND, OR, and COND, are expressed as macros In 
terms of the applicative basis set. A small number of optimization tech- 
niques, coupled with the treatment of function calls as GOTO statements, 
serve to produce code as good as that produced by more traditional compilers. 
The macro approach enables speedy implementation of new constructs as 
desired without sacrificing efficiency in the generated code. 

A fair amount of analysis is devoted to determining whether environments 
may be stack-allocated or must be heap-allocated. Heap-allocated environ- 
ments are necessary in general because SCHEME (unlike Algol 60 and Algol 68, 
for example) allows procedures with free lexically scoped variable to be 
returned as the values of other procedures: the- Algol stack-allocation environ- 
ment strategy does not suffice. The methods used here Indicate that a • 
heap-allocating general izai ton of the "display 11 technique leads to an efficient 
implementation of such "upward funargs". Moreover, compile-time optimiza- 
tion and analysis can eliminate many "funargs" entirely, and so far fewer 
environment structures need be allocated at run time than might be 
expected. 

A subset of SCHEME (rather than triples, for example) serves as the 
representation intermedieate between the optimized SCHEME code and the final 
output code; code is expressed in this subset in the so' called const Inua- 
tion-passinq style. As a subset of SCHEME, it enjoys the same theoretical 
properties; one could even apply the same optimizer used on the input code 
to the intermediate code. However, the subset is so chosen that all temp- 
orary quantities are made manifest as variables, and no control stack Is 
needed to evaluate it. As a result, this apparently applicative represen- 
tation admits an imperative interpretation which permits easy transcription 
to fianl imperative machine code. These qualities suggest that an applica- 
tive language like SCHEME is a better candidate for an UNCOL than the more 
imperative candidates proposed to date. 
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Note 

The first part of this report is a slightly revised version of a 
dissertation submitted in May 1977. Where it was of historical interest to 
reflect changes in the SCHEME language which ocurred in the following year and 
the effect they had on RABBIT, the text was left intact, with notes added of the 
form, "Since the dissertation was written, thus-and-so occurred." The second 
part, the Appendix, was not part of the dissertation, and is a complete listing 
of the source code for RABBIT, with extensive commentary. 

It is intended that the first part should be self-contained, and provide 
a qualitative overview of the compilation methods used in RABBIT. The second 
part is provided for those readers who would like to examine the precise 
mechanisms used to carry out the general methods. 

Thus there are five levels of thoroughness at which the reader may 
consume this document: 

(1) The reader who wishes only to skim is advised to read sections 1, 5, 6, 
possibly 7, 8A, 8B, 8C, 10, 11, and 12. This will give a basic overview, 
including the use of macros and the optimizing techniques. 

(2) The reader who also wants to know about the details of SCHEME, the run-time 
system, and a long example is advised to read the entire main text (about a third 
of the document) . 

(3) The reader who wants to understand the low-level organization of the 
algorithms, and read about the more tricky special cases, should read the main 
text and then the commentary on the code. 

(4) The reader who additionally wants to understand the nit-picking details 
should read the code along with the commentary. 

(5) The reader who wants a real feel for the techniques involved should read the 
entire document, invent three new SCHEME constructs and write macros for them, 
and then reimplement the compiler for another run-time environment. (He ought 
please also to send a copy of any documents on such a project to this author, who 
would be very interested!) 
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1. Introduction 



The work described here is a continuation (!) of that described in 
[SCHEME], [Imperative], and [Declarative]. Before enumerating the points of the 
thesis, we summarize here each of these documents. 



A. Background 

In [SCHEME] we (Gerald Jay Sussman and the author) described the 
implementation of a dialect of LISP named SCHEME with the properties of lexical 
scoping and tail-recursion; this implementation is embedded within MacLISP 
[Moon], a version of LISP which does not have these properties. The property of 
lexical scoping (that a variable can be referenced only from points textually 
within the expression which binds it) is a consequence of the fact that all 
functions are closed in the -binding environment". [Moses] That is, SCHEME is a 
"full-funarg" LISP dialect. (Note Full-Funarg Example} The property of tail- 
recursion implies that loops written in an apparently recursive form will 
actually be executed in an iterative fashion. Intuitively, function calls do not 
"push control stack"; instead, it is argument evaluation which pushes control 
stack. The two properties of lexical scoping and tail-recursion are not 
independent. In most LISP systems [LISP1.5M] [Moon] [Teitelman], which use 
dynamic scoping rather than lexical, tail-recursion is impossible because 
function calls must push control stack in order to be able to undo the dynamic 
bindings after the return of the function. On the other hand, it is possible to 
have a lexically scoped LISP which does not tail-recurse, but it is easily seen 
that such an implementation only wastes storage space needlessly compared to a 
tail-recursing implementation. [Steele] Together, these two properties cause 
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SCHEME to reflect lambda-calculus semantics much more closely than dynamically 
scoped LISP systems. SCHEME also permits the treatment of functions as full- 
fledged data objects; they may be passed as arguments, returned as values, made 
part of composite data structures, and notated as independent, unnamed 
("anonymous") entities. (Contrast this with most ALGOL- like languages, in which 
a function can be written only by declaring it and giving it a name; imagine 
being able to use an integer value only by giving it a name in a declaration!) 
The property of lexical scoping allows this to be done in a consistent manner 
without the possibility of identifier conflicts (that is, SCHEME "solves the 
FUNARG problem" [Moses]). In [SCHEME] we also discussed the technique of 
"continuation-passing style", a way of writing programs in SCHEME such that no 
function ever returns a value. 

In [Imperative] we explored ways of exploiting these properties to 
implement most traditional programming constructs, such as assignment, looping, 
and call-by-name, in terms of function application. Such applicative (lambda- 
calculus) models of programming language constructs are well-known to 
theoreticians (see [Stoy], for example), but have not been used in a practical 
programming system. All of these constructs are actually made available in 
SCHEME by macros which expand into these applicative definitions. This technique 
has permitted the speedy implementation of a rich user-level language in terms of 
a very small, easy-to-implement basis set of primitive constructs. In 
[Imperative] we continued the exploration of continuation-passing style, and 
noted that the escape operator CATCH is easily modelled by transforming a program 
into this style. We also pointed out that transforming a program into this style 
enforces a particular order of argument evaluation, and makes all intermediate 
computational quantities manifest as variables. 

In [Declarative] we examined more closely the issue of tail-recursion, 



and demonstrated that the usual view of function calls as pushing a return 
address must lead to an either inefficient or inconsistent implementation, while 
the tail-recursive approach of SCHEME leads to a uniform discipline in which 
function calls are treated as GOTO statements which also pass arguments. We also 
noted that a consequence of lexical scoping is that the only code which can 
reference the value of a variable in a given environment is code which is closed 
in that environment or which receives the value as an argument; this in turn 
implies that a compiler can structure a run-time environment in any arbitrary 
fashion, because it will compile all the code which can reference that 
environment, and so can arrange for that code to reference it in the appropriate 
manner. Such references do not require any kind of search (as is commonly and 
incorrectly believed in the LISP community because of early experience with LISP 
interpreters which search a-lists) because the compiler can determine the precise 
location of each variable in an environment at compile time. It is not necessary 
to use a standard format, because neither interpreted code nor other compiled 
code can refer to that environment. (This is to be constrasted with "spaghetti 
stacks" [Bobrow and Wegbreit].) In [Declarative] we also carried on the analysis 
of continuation-passing style, and noted that transforming a program into this 
style elucidates traditional compilation issues such as register allocation 
because user variables and intermediate quantities alike are made manifest as 
variables on an equal footing. Appendix A of [Declarative] contained an 
algorithm for converting any SCHEME program (not containing ASET) to 
continuation-passing style. 

We have implemented two compilers for the language SCHEME. The purpose 
was to explore compilation techniques for a language modelled on lambda-calculus, 
using lambda-calculus-style models of imperative programming constructs. Both 
compilers use the strategy of converting the source program to continuation- 
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passing style. 

The first compiler (known as CHEAPY) was written as a throw-away 
implementation to test the concept of conversion to continuation-passing style. 
The first half of CHEAPY is essentially the algorithm which appears in Appendix A 
of [Declarative], and the second is a simple code generator with almost no 
optimization. In conjunction with the writing of CHEAPY, the SCHEME interpreter 
was modified to interface to compiled functions. (This interface is described 
later in this report.) 

The second compiler, with which we are primarily concerned here, is known 
as RABBIT. It, like CHEAPY, is written almost entirely in SCHEME (with minor 
exceptions due only to problems in interfacing with certain MacLISP I/O 
facilities). Unlike CHEAPY, it is fairly clever. It is intended to demonstrate 
a number of optimization techniques relevant to lexical environments and tail- 
recursive control structures. (The code for RABBIT, with commentary, appears in 
the Appendix. ) 



B. The Thesis 

(1) Function calls are not expensive when compiled correctly; they should be 
thought of as GOTO statements that happen to pass arguments. 

(2) The combination of cheap function calls, lexical scoping, tail-recursion, 
and "anonymous" notation of functions (which are not independent properties 
of a language, but aspects of a single unified approach) permits the 
definition of a wide variety of "imperative" constructs in applicative 
terms. Because these properties result from adhering to the principles of 
the well-known lambda-calculus [Church], such definitions can be lifted 
intact from existing literature and used directly. 
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(3) A macro facility (the ability to specify syntactic transformations) makes it 
practical to use these as the only definitions of imperative constructs in a 
programming system. Such a facility makes it extremely easy to define new 
constructs. 

(4) A few well-chosen optimization strategies enable the compilation of these 
applicative definitions into the imperative low-level code which one would 
expect from a traditional compiler. 

(5) The macro facility and the optimization techniques used by the compiler can 
be conceptually unified. The same properties which make it easy to write 
the macros make it easy to define optimizations correctly. In the same way 
that many programming constructs are defined in terms of a small, well- 
chosen basis set, so a large number of traditional optimization techniques 
fall out as special cases of the few used in RABBIT. This is no accident. 
The separate treatment of a large and diverse set of constructs necessitates 
separate optimization techniques for each. As the basis set of constructs 
is reduced, so is the set of interesting transformations. If the basis set 
is properly chosen, their combined effect is "multiplicative" rather than 
"additive". 

(6) The technique of compiling by converting to continuation-passing style 
elucidates some important compilation issues in a natural way. Intermediate 
quantities are made manifest; so is the precise order of evaluation. 
Moreover, this is all expressed in a language isomorphic to a subset of the 
source language SCHEME; as a result the continuation-passing style version 
of a program inherits many of the philosophical and practical advantages. 
For example, the same optimization techniques can be applied at this level 
as at the original source level. While the use of continuation-passing 
style may not make the decisions any easier, it provides an effective and 
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natural way to express the results of those decisions. 

(7) Continuation-passing style, while apparently applicative in nature, admits a 
peculiarly imperative interpretation as a consequence of the facts that it 
requires no control stack to be evaluated and that no functions ever return 
values. As a result, it is easily converted to an imperative machine 
language. 

(8) A SCHEME compiler should ideally be a designer of good data structures, 
since it may choose any representation whatsoever for environments. RABBIT 
has a rudimentary design knowledge, involving primarily the preferral of 
registers to heap-allocated storage. However, there is room for knowledge 
of "bit-diddling" representations. 

(9) We suggest that those who have tried to design useful UNCOL's (UNiversal 
Computer-Oriented Languages) [Sammet] [Coleman] have perhaps been thinking 
too imperatively, and worrying more about data manipulation primitives than 
about environment and control issues. As a result, proposed UNCOLs have 
been little more than generalizations of contemporary machine languages. We 
suggest that SCHEME makes an ideal UNCOL at two levels. The first level is 
the fully applicative level, to which most source-language constructs are 
easily reduced; the second is the continuation-passing style level, which 
is easily reduced to machine language. We envision building a compiler in 
three stages: (a) reduction of a user language to basic SCHEME, whether by 
macros, a parser of algebraic syntax, or some other means; (b) optimization 
by means of SCHEME-level source-to-source transformations, and conversion to 
continuation-passing style; and (c) generation of code for a particular 
machine. RABBIT addresses itself to the second stage. Data manipulation 
primitives are completely ignored at this stage, and are just passed along 
from input to output. These primitives, whether integer arithmetic, string 
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concatenation and parsing, or list structure manipulators, are chosen as a 
function of a particular source language and a particular target machine. 
RABBIT deals only with fundamental environment and control issues common to 
most modes of algorithmic expression. 

(10) While syntactic issues tend to be rather superficial, we point out that 
algebraic syntax tends to obscure the fundamental nature of function calling 
and tail-recursion by arbitrarily dividing functions into syntactic classes 
such as "operators" and "functions". ([Standish], for example, uses much 
space to exhibit each conceptually singular transformation in a multiplicity 
of syntactic manifestations.) The lack of an "anonymous" notation for 
functions in most algebraic languages, and the inability to treat functions 
as data objects, is a distinct disadvantage. The uniformity of LISP syntax 
makes these issues easier to deal with. 

To the LISP community in particular we address these additional points: 

(11) Lexical scoping need not be as expensive as is commonly thought. Experience 
with lexically-scoped interpreters is misleading; lexical scoping is not 
inherently slower than dynamic scoping. While some implementations may 
entail access through multiple levels of structure, this occurs only under 
circumstances (accessing of variables through multiple levels of closure) 
which could not even be expressed in a dynamically scoped language. Unlike 
deep-bound dynamic variables, compiled lexical access requires no search; 
unlike shallow-bound dynamic variables, lexical binding does not require 
that values be put in a canonical value cell. The compiler has complete 
discretion over the manipulation of environments and variable values. The 
"display" technique used in Algol implementations can be generalized to 
provide an efficient solution to the FUNARG problem. 

(12) Lexical scoping does not necessarily make LISP programming unduly difficult. 
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The very existence of RABBIT, a working compiler some fifty pages in length 
written in SCHEME, first implemented in about a month, part-time, 
substantiates this claim (which is, however, admitted to be mostly a matter 
of taste and experience). {Note Refinement of RABBIT} SCHEME has also been 
used to implement several AI problem-solving languages, including AMORD 
[Doyle]. 
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2. The Source Language - SCHEME 

The basic language processed by RABBIT is a subset of the SCHEME language 
as described in [SCHEME] and [Revised Report], the primary restrictions being 
that the first argument to ASET must be quoted and that the multiprocessing 
primitives are not accommodated. This subset is summarized here. 

SCHEME is essentially a lexically scoped ("full funarg") dialect of LISP. 
Interpreted programs are represented by S-expressions in the usual manner. 
Numbers represent themselves. Atomic symbols are used as identifiers (with the 
conventional exception of T and NIL, which are conceptually treated as 
constants). All other constructs are represented as lists. 

In order to distinguish the various other constructs, SCHEME follows the 
usual convention that a list whose car is one of a set of distinguished atomic 
symbols is treated as directed by a rule associated with that symbol. All other 
lists (those with non-atomic cars, or with undistinguished atoms in their cars) 
are combinations , or function calls. All subforms of the list are uniformly 
evaluated in an unspecified order, and then the value of the first (the function) 
is applied to the values of all the others (the arguments). Notice that the 
function position is evaluated in the same way as the argument positions (unlike 
most other LISP systems). (In order to be able to refer to MacLISP functions, 
global identifiers evaluate to a special kind of functional object if they have 
definitions as MacLISP functions of the EXPR, SUBR, or LSUBR varieties. Thus 
"(PLUS 12)" evaluates to 3 because the values of the subforms are functional 
object for PLUS>, 1, and 2; and applying the first to the other two causes 
invocation of the MacLISP primitive PLUS.) 

The atomic symbols which distinguish special constructs are as follows: 
LAMBDA This denotes a function. A form (LAMBDA (varl var2 ... varn) body) 
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IF 



QUOTE 



LABELS 



will evaluate to a function of n arguments. The parameters vari are 

identifiers (atomic symbols) which may be used in the body to refer to 

the respective arguments when the function is invoked. Note that a 

LAMBDA-expression is not a function, but evaluates to one, a crucial 

distinction. 

This denotes a conditional form. (IF a b c) evaluates the predicate a, 

producing a value x; if x is non-NIL, then the consequent b is 

evaluated, and otherwise the alternative c. If c is omitted, NIL is 

assumed. 

As in all LISP systems, this provides a way to specify any S-expression 

as a constant. (QUOTE x) evaluates to the S-expression x. This may be 

abbreviated to 'x, thanks to the NacLISP read-macro-character feature. 

This primitive permits the local definition of one or more mutually 

recursive functions. The format is: 



(LABELS ((namel (LAMBDA ...)) 

(nameZ (LAMBDA ...)) 

(namen (LAMBDA ...))) 
body) 



ASET 



This evaluates the body in an environment in which the names refer to 
the respective functions, which are themselves closed in that same 
environment. Thus references to these names in the bodies of the 
LANBDA-expressions will refer to the labelled functions. (Note 
Generalized LABELS} 

This is the primitive side-effect on variables. (ASET 1 var body) 
evaluates the body, assigns the resulting value to the variable var, 
and returns that value. {Note Non-quoted ASET} For implementation- 
dependent reasons, it is forbidden by RABBIT to use ASET' on a global 



CATCH 



Macros 
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variable which is the name of a primitive MacLISP function, or on a 
variable bound by LABELS. (ASET' is actually used very seldom in 
practice anyway, and all these restrictions are "good programming 
practice". RABBIT could be altered to lift these restrictions, at some 
expense and labor.) 

This provides an escape operator facility. [Landin] [Reynolds] (CATCH 
var body) evaluates the body, which may refer to the variable var, 
which will denote an "escape function" of one argument which, when 
called, will return from the CATCH-form with the given argument as the 
value of the CATCH-form. Note that it is entirely possible to return 
from the CATCH-form several times. This raises a difficulty with 
optimization which will be discussed later. 

Any atomic symbol which has been defined in one of various ways to be a 
macro distinguishes a special construct whose meaning is determined by 
a macro function. This function has the responsibility of rewriting 
the form and returning a new form to be evaluated in place of the old 
one. In this way complex syntactic constructs can be expressed in 
terms of simpler ones. 
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3. The Target Language 

The "target language" is a highly restricted subset of MacLISP, rather 
than any particular machine language for an actual hardware machine such as the 
PDP-10. RABBIT produces MacLISP function definitions which are then compiled by 
the standard MacLISP compiler. In this way we do not need to deal with the 
uninteresting vagaries of a particular piece of hardware, nor with the 
peculiarities of the many and various data-manipulation primitives (CAR, RPLACA, 
+ , etc.)- We allow the MacLISP compiler to deal with them, and concentrate on 
the issues of environment and control which are unique to SCHEME. While for 
production use this is mildly inconvenient (since the code must be passed through 
two compilers before use), for research purposes it has saved the wasteful re- 
implementation of much knowledge already contained in the MacLISP compiler. 

On the other hand, the use of MacLISP as a target language does not by 
any means trivialize the task of RABBIT. The MacLISP function-calling mechanism 
cannot be used as a target construct for the SCHEME function call, because 
MacLISP' s function calls are not guaranteed to behave tail-recursively. Since 
tail-recursion is a most crucial characteristic distinguishing SCHEME from most 
LISP systems, we must implement SCHEME function calls by more primitive methods. 
Similarly, since SCHEME is a full-funarg dialect of LISP while MacLISP is not, we 
cannot in general use MacLISP' s variable-binding mechanisms to implement those of 
SCHEME. On the other hand, it is a perfectly legitimate optimization to use 
MacLISP mechanisms in those limited situations where they are applicable. 

Aside from ordinary MacLISP data-manipulation primitives, the only 
MacLISP constructs used in the target language are PROG, GO, RETURN, PROGN, COND, 
SETQ, and ((LAMBDA ...) ...). PROG is never nested; there is only a single, 
outer PROG. RETURN is used only in the form (RETURN NIL) to exit this outer 
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PROG; it is never used to return a value of any kind. LAMBDA-expressions are 
used only to bind temporary variables. In addition, CONS, CAR, CDR, RPLACA, and 
RPLACD are used in the creation and manipulation of environments. 

We may draw a parallel between each of these constructs and an equivalent 
machine-language (or rather, assembly language) construct: 
PROG A single program module. 

GO A branch instruction. PROG tags correspond to instruction labels. 

RETURN Exit from program module. 

PROGN Sequencing of several instructions. 

COND Conditional branches, used in a disciplined manner. One may think of 

(COND (predl valuel) 
(predZ value2) 

(predn valuen)) 
as representing the sequence of code 



TAG1 



TAG2: 



TAGn: 
ENDTAG: 



<code for predl> 
JUMP-IF-NIL regl,TAGl 
<code for valuel> 
JUMP ENDTAG 
<code for pred2> 
JUMP-IF-NIL regl,TAGZ 
<code for value2> 
JUMP ENDTAG 

<code for predn> 
JUMP-IF-NIL regl,TAGn 
<code for valuen> 
JUMP ENDTAG 
LOAD-VALUE NIL 



which admits of some optimizations, but we shall not worry about this. 
(The MacLISP compiler does, but we do not depend at all on this fact.) 
SETQ Load register, or store into memory. 
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LAMBDA We use this primarily in the form: 



((LAMBDA (ql ... qn) 

(setq varl ql) 

(setq varn qn)) 
valuel ... valuen) 



which we may think of as saving values on a temporary stack and then 
popping them into the variables: 



CONS 



<code for valuel> jleaves result in real 

PUSH regl 

<code for valuen> 
PUSH regl 
POP varn 

POP varl 

This is in fact approximately how the MacLISP compiler will treat this 
construct. This is used to effect the simultaneous assignment of 
several values to several registers. It would be possible to do 
without the MacLISP LAMBDA in this case, by using extra intermediate 
variables, but it was decided that this task was less interesting than 
other issues within RABBIT, and that assignments of this kind would 
occur sufficiently often that it was desirable to get the MacLISP 
compiler to produce the best possible code in this case. 
The form ((LAMBDA ...) ...) is also used in some situation where the 
user wrote such a form in the SCHEME code, and the arguments and 
LAMBDA-body are all "trivial", in a sense to be defined later. 
CONS is used, among other things, to "push" new values onto the current 
environment. While SCHEME variables can sometimes be represented as 
temporary MacLISP variables using LAMBDA, in general they must be kept 
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in a "consed environment" in the heap; CAR and CDR are used to "index" 
the environment "stack" (which is not really a stack, but in general 
tree-like). (N.B. By using CONS for this purpose we can push the 
entire issue of environment retention off onto the LISP garbage 
collector. It would be possible to use array-like blocks for 
environments, and an Algol-like "display" pointer discipline for 
variable access. However, a retention strategy as opposed to a 
deletion strategy must be used in general, because SCHEME, unlike Algol 
60 and 68, permits procedures to be the values of other procedures. 
Stack allocation does not suffice in general -- a heap must be used. 
Later we will see that RABBIT uses stack allocation of environments and 
a deletion strategy in simple cases, and reverts to heap allocation of 
environments and a retention strategy in more complicated situations.) 
CAR, + Primitive MacLISP operators such as + and CAR are analogous to machine- 
language instructions such as ADD and LOAD-INDEXED. We leave to the 
MacLISP compiler the task of compiling large expressions involving 
these; but we are not avoiding the associated difficult issues such as 
register allocation, for we shall have to deal with them in compiling 
calls to SCHEME functions. 
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4. The Target Machine 

Compiled code is interfaced to the SCHEME interpreter in two ways. The 
interpreter must be able to recognize functional objects which happen to be 
compiled and to invoke them with given arguments; and compiled code must be able 
to invoke any function, whether interpreted or compiled, with given arguments. 
(This latter interface is traditionally known as the w UUO Handler" as the result 
of the widespread use of the PDP-10 in implementing LISP systems. [DEC] [Moon] 
[Teitelman]) We define here an arbitrary standard form for functional objects, 
and a standard means for invoking them. 

In the PDP-10 MacLISP implementation of SCHEME, a function is, in 
general, represented as a list whose car contains one of a set of distinguished 
atomic symbols. (Notice that LAMBDA is not one of these; a LAMBDA-expression 
ma y evaluate to a function, but is not itself a valid function.) This set of 
symbols includes EXPR, SUBR, and LSUBR, denoting primitive MacLISP functions of 
those respective types; BETA, denoting a SCHEME function whose code is 
interpretive; DELTA, denoting an escape function created by the interpreter for 
a CATCH form, or a continuation given by the interpreter to compiled code; 
CBETA, denoting a SCHEME function or continuation whose code is compiled; and 
EPSILON, denoting a continuation created when compiled code invokes interpreted 
code. Each of these function types requires a different invocation convention; 
the interpreter must distinguish these types and invoke them in the appropriate 
manner. For example, to invoke an EXPR the MacLISP FUNCALL construct must be 
used. A BETA must be invoked by creating an appropriate environment, using the 
given arguments, and then interpreting the code of the function. 

We have arbitrarily defined the CBETA interface as follows: there are a 
number of "registers", in the form of global variables. Nine registers called 
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**CONIT**, **ONE**, **TWO** **EIGHT** are used to pass arguments to compiled 

functions. **CONIT** contains the continuation. The others contain the arguments 
prescribed by the user; if there are more than eight arguments, however, then 
they are passed as a list of all the arguments in register **ONE**, and the 
others are unused. (Any of a large variety of other conventions could have been 
chosen, such as the first seven arguments in seven registers and a list of all 
remaining arguments in **EIGHT**. We merely chose a convention which would be 
workable and convenient, reflect the typical finiteness of hardware register 
sets, and mirror familiar LISP conventions. The use of a list of arguments is 
analogous to the passing of an arbitrary number of arguments on a stack, 
sometimes known as the LSUBR convention. [Moon] [Declarative]) 

There is another register called **FUN**. A function is invoked by 
putting the functional object in **FUN**, its arguments in the registers already 
described, and the number of arguments in the register **NARGS**, and then 
exiting the current function. Control (at the MacLISP level) is then transferred 
to a routine (the "SCHEME UUO handler") which determines the type of the function 
in **FUN** and invokes it. 

A continuation is invoked in exactly the same manner as any other kind of 
function, with two exceptions: a continuation does not itself require a 
continuation, so **CONT** need not be set up; and a continuation always takes a 
single argument, so **NARGS** need not be set to 1. {Note Multiple- 
Argument Continuations} 

A CBETA form has additional fixed structure. Besides the atomic symbol 
CBETA in the car, there is always in the cadr the address of the code, and in the 
cddr the environment. The form of the environment is completely arbitrary as far 
as the SCHEME interpreter is concerned; indeed, the CHEAPY compiler and the 
RABBIT compiler use completely different formats for environments for compiled 
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function. (Recall that this cannot matter since the only code which will ever be 
able to access that environment is the code belonging the the functional closure 
of which that environment is a part.) The "UUO handler" puts the cddr of **FUN** 
in the register **ENV**, and then transfers to the address in the cadr of 
**FUN**. When that code eventually exits, control returns to the "UUO handler", 
which expects the code to have set up **FUN** and any necessary arguments. 

There is a set of "memory locations" -11-, -12-, ... which are used to 
hold intermediate quantities within a single user function. (Later we shall see 
that we think of these as being used to pass values between internally generated 
functions within a module. For this purpose we think of the "registers" and 

"memory locations" being arranged in a single sequence **CONT**, **ONE** 

**EIGHT**, -11-, -12-, ... There is in principle an unbounded number of these 
"memory locations", but RABBIT can determine (and in fact outputs as a 
declaration for the NacLISP compiler) the exact set of such locations used by any 
given function.) One may think of the "memory locations" as being local to each 
module, since they are never used to pass information between modules; in 
practice they are implemented as global NacLISP variables. 

The registers **FUN**, **NARGS**, **ENV**, and the argument registers are 
the only global registers used by compiled SCHEME code (other than the "memory 
locations"). Except for global variables explicitly mentioned by the user 
program, all communication between compiled SCHEME functions is through these 
registers. It is useful to note that the continuation in **CONT** is generally 
analogous to the usual "control stack" which contains return addresses, and so we 
may think of **CONT** as our "stack pointer register". 
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5. Language Design Considerations 

SCHEME is a lexically scoped ("full-funarg") dialect of LISP, and so is 
an applicative language which conforms to the spirit of the lambda-calculus. 
[Church] We divide the definition of the SCHEME language into two parts: the 
environment and control constructs, and the data manipulation primitives. 
Examples of the former are LAMBDA-expressions, combinations, and IF; examples of 
the latter are CONS, CAR, EQ, and PLUS. Note that we can conceive of a version 
of SCHEME which did not have CONS, for example, and more generally did not have 
S-expressions in its data domain. Such a version would still have the same 
environment and control constructs, and so would hold the same theoretical 
interest for our purposes here. (Such a version, however, would be less 
convenient for purposes of writing a meta-circular description of the language, 
however! ) 

By the "spirit of lambda-calculus" we mean the essential properties of 
the axioms obeyed by lambda-calculus expressions. Among these are the rules of 
alpha-conversion and beta-conversion. The first intuitively implies that we can 
uniformly rename a function parameter and all references to it without altering 
the meaning of the function. An important corollary to this is that we can in 
fact effectively locate all the references. The second implies that in a 
situation where a known function is being called with known argument expressions, 
we may substitute an argument expression for a parameter reference within the 
body of the function (provided no naming conflicts result, and that certain 
restrictions involving side effects are met). Both of these operations are of 
importance to an optimizing compiler. Another property which follows indirectly 
is that of tail-recursion. This property is exploited in expressing iteration in 
terms of applicative constructs, and is discussed in some detail in 
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[Declarative]. 

We realize that other systems of environment and control constructs also 
are reasonably concise, clear, and elegant, and can be axiomatized in useful 
ways, for example the guarded commands of Dijkstra. [Dijkstra] However, that of 
lambda-calculus is extremely well-understood, lends itself well to certain kinds 
of optimizations in a natural manner, and has behind it a body of literature 
which can be used directly by RABBIT to express non-primitive constructs. 

The desire for uniform lexical scoping arises from other motives as well, 
some pragmatic, some philosophical. Many of these are described in [SCHEME], 
[Imperative], [Declarative], and [Revised Report]. It is often difficult to 
explain some of these to those who are used to dynamically scoped LISP systems. 
Any one advantage of lexical scoping may often be countered with "Yes, but you 
can do that in this other way in a dynamically scoped LISP." However, we are 
convinced that lexical scoping in its totality provides all of the advantages to 
be described in a natural, elegant, and integrated manner, largely as a 
consequence of its great simplicity . 

There are those to whom lexical scoping is nothing new, for example the 
ALGOL community. For this audience, however, we should draw attention to another 
important feature of SCHEME, which is that functions are first-class data 
objects. They may be assigned or bound to variables, returned as values of other 
functions, placed in arrays, and in general treated as any other data object. 
Just as numbers have certain operations defined on them, such as addition, so 
functions have an important operation defined on them, namely invocation. 

The ability to treat functions as objects is not at all the same as the 
ability to treat representations of functions as objects. It is the latter 
ability that is traditionally associated with LISP; functions can be represented 
as S-expressions. In a version of SCHEME which had no S-expression primitives, 
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however, one could still deal with functions (i.e. closures) as such, for that 
ability is part of the fundamental environment and control facilities. 
Conversely, in a SCHEME which does have CONS, CAR, and CDR, there is no defined 
way to use CONS by itself to construct a function (although a primitive ENCLOSE 
is now provided which converts an S-expression representation of a function into 
a function), and the CAR or CDR of a function is in general undefined. The only 
defined operation on a function is invocation. {Note Operations on Functions) 

We draw this sharp distinction between environment and control constructs 
on the one hand and data manipulation primitives on the other because only the 
former are treated in any depth by RABBIT, whereas much of the knowledge of a 
"real" compiler deals with the latter. A PL/I compiler must have much specific 
knowledge about numbers, arrays, strings, and so on. We have no new ideas to 
present here on such issues, and so have avoided this entire area. RABBIT itself 
knows almost nothing about data manipulation primitives beyond being able to 
recognize them and pass them along to the output code, which is a small subset of 
MacLISP. In this way RABBIT can concentrate on the interesting issues of 
environment and control, and exploit the expert knowledge of data manipulation 
primitives already built into the MacLISP compiler. 
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6. The Use of Macros 

An important characteristic of the SCHEME language is that its set of 
primitive constructs is quite small. This set is not always convenient for 
expressing programs, however, and so a macro facility is provided for extending 
the expressive power of the language. A macro is best thought of as a syntax 
rewrite rule . As a simple example, suppose we have a primitive GCD which takes 
only two arguments, and we wish to be able to write an invocation of a GCD 
function with any number of arguments. We might then define (in a "production- 
rule" style) the conditional rule: 

(XGCD) => 

(XGCD x) => x 
(XGCD x . rest) => (GCD x (XGCD . rest)) 

(Notice the use of LISP dots to refer to the rest of a list.) This is not 
considered to be a definition of a function XGCD, but a purely syntactic 
transformation. In principle all such transformations could be performed before 
executing the program. In fact, RABBIT does exactly this, although the SCHEME 
interpreter naturally does it incrementally, as each macro call is encountered. 
Rather than use a separate production-rule/pattern-matching language, in 
practice SCHEME macros are defined as transformation functions from macro-call 
expressions to resulting S-expressions, just as they are in MacLISP. (Here, 
however, we shall continue to use production rules for purposes of exposition.) 
It is important to note that macros need not be written in the language for which 
they express rewrite rules; rather, they should be considered an adjunct to the 
interpreter, and written in the same language as the interpreter (or the 
compiler). To see this more clearly, consider a version of SCHEME which does not 
have S-expressions in its data domain. If programs in this language are 
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represented as S-expressions t then the interpreter for that language cannot be 
written in that language, but in another meta-language which does deal with S- 
expressions. Macros, which transform one S-expression (representing a macro 
call) to another (the replacement form, or the interpretation of the call), 
clearly should be expressed in this meta-language also. The fact that in most 
LISP systems the language and the meta-language appear to coincide is a source of 
both power and confusion. 

In the PDP-10 MacLISP implementation of SCHEME, four separate macro 
mechanisms are used in practice. One is the MacLISP read-macro mechanism [Moon], 
which performs transformations such as 'F00 => (QUOTE F00) when an expression is 
read from a file. The other three are as described earlier, processed by the 
interpreter or compiler, and differ only in that one kind is recognized by the 
MacLISP interpreter as well while the other two are used only by SCHEME, and that 
of the latter two one kind is written in MacLISP and the other kind in SCHEME 



itself. 



There is a growing library of SCHEME macros which express a variety of 
traditional programming constructs in terms of other, more primitive constructs, 
and eventually in terms of the small set of primitives. A number of these are 
catalogued in [Imperative] and [Revised Report]. Others were invented in the 
course of writing RABBIT. We shall give some examples here. 

The BLOCK macro is similar to the MacLISP PROGN; it evaluates all its 
arguments and returns the value of the last one. One critical characteristic is 
that the last argument is evaluated "tail-recursively" (I use horror quotes 
because normally we speak of invocation, not evaluation, as being tail- 
recursive). An expansion rule is given for this in [Imperative] equivalent to: 

(BLOCK x) => x 

(BLOCK x . rest) => ((LAMBDA (DUMMY) (BLOCK . rest)) x) 
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This definition exploits the fact that SCHEME is evaluated in applicative order, 
and so will evaluate all arguments before applying a function to them. Thus, in 
the second subrule, x must be evaluated, and then the block of all the rest is. 
It is then clear from the first subrule that the last argument is evaluated 
"tail-recursively" . 

One problem with this definition is the occurrence of the variable DUMMY, 
which must be chosen so as not to conflict with any variable used by the user. 
This we refer to as the "GENSYM problem", in honor of the traditional LISP 
function which creates a "fresh" symbol. It would be nicer to write the macro in 
such a way that no conflict could arise no matter what names were used by the 
user. There is indeed a way, which ALGOL programmers will recognize as 
equivalent to the use of "thunks", or call-by-name parameters: 

(BLOCK x) => x 

(BLOCK x . rest) => ((LAMBDA (A B) (B)) 

x 

(LAMBDA () (BLOCK . rest))) 

Consider carefully the meaning of the right-hand side of the second subrule. 
First the expression x and the (LAMBDA () ...) must be evaluated (it doesn't 
matter in which order!); the result of the latter is a function (that is, a 
closure), which is later invoked in order to evaluate the rest of the arguments. 
There can be no naming conflicts here, because the scope of the variables A and B 
(which is lexical ) does not contain any of the arguments to BLOCK written by the 
user. (We should note that we have been sloppy in speaking of the "arguments" to 
BLOCK, when BLOCK is properly speaking not a function at all, but merely a 
syntactic keyword used to recognize a situation where a syntactic rewriting rule 
is applicable. We would do better to speak of "argument expressions" or "macro 
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arguments", but we shall continue to be sloppy where no confusion should arise.) 
This is a technique which should be understood quite thoroughly, since it 
is the key to writing correct macro rules without any possibility of conflicts 
between names used by the user and those needed by the macro. As another 
example, let us consider the AND and OR constructs as used by most LISP systems. 
OR evaluates its arguments one by one, in order, returning the first non-NIL 
value obtained (without evaluating any of the following arguments), or NIL if all 
arguments produce NIL. AND is the dual to this; it returns NIL if any argument 
does, and otherwise the value of the last argument. A simple-minded approach to 
OR would be: 

(OR) => -NIL 

(OR x . rest) => (IF x x (OR . rest)) 

There is an objection to this, which is that the code for x is duplicated. Not 
only does this consume extra space, but it can execute erroneously if x has any 
side-effects. We must arrange to evaluate x only once, and then test its value: 

(OR) => 'NIL 

(OR x . rest) => ((LAMBDA (V) (IF V V (OR . rest))) x) 

This certainly evaluates x only once, but admits a possible naming conflict 
between the variable V and any variables used by rest. This is avoided by the 
same technique used for BLOCK: 



(OR) => -NIL 

(OR x . rest) => ((LAMBDA (V R) (IF V V (R))) 

x 

(LAMBDA () (OR . rest))) 

Similarly, we can express AND as follows: 



32 



(AND) => 'T 

(AND x) => x 

(AND x . rest) => ((LAMBDA (V R) (IF V (R) 'NIL)) 

x 

(LAMBDA () (AND . rest))) 



(The macro rules are not precise duals because of the non-duality between NIL- 
ness and non-NIL-ness, and the requirement that a successful AND return the 
actual value of the last argument and not just T.) {Note Tail-Recursive OR) 

As yet another example, consider a modification to BLOCK to allow a 
limited form of assignment statement: if (v := x) appears as a statement in a 
block, it "assigns" a value to the variable v whose scope is the remainder of the 
block. Let us assume that such a statement cannot occur as the last statement of 
a block (it would be useless to have one in that position, as the variable would 
have a null scope). We can write the rule: 

(BLOCK x) => x 

(BLOCK (v := x) . rest) => ((LAMBDA (v) (BLOCK . rest)) x) 

(BLOCK x . rest) => ((LAMBDA (A B) (B)) 

x 

(LAMBDA () (BLOCK . rest))) 

The second subrule states that an "assignment" causes x to be evaluated and then 
bound to v, and that the variable v is visible to the rest of the block. 

We may think of := as a "sub-macro keyword" which is used to mark an 
expression as suitable for transformation, but only in the context of a certain 
larger transformation. This idea is easily extended to allow other 
constructions, such as "simultaneous assignments" of the form 

((varl varZ ... varn) := valuel value2 ... valuen) 

which first compute all the values and then assign to all the variables, and 
"exchange assignments" of the form (X : = : Y), as follows: 
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(BLOCK x) => x 
(BLOCK (v := x) . rest) 

=> ((LAMBDA (v) (BLOCK . rest)) x) 
(BLOCK (vars := . values) . rest) 

=> ((LAMBDA vars (BLOCK . rest)) . values) 
(BLOCK (x :=: y) . rest) 

=> ((LAMBDA (x y) (BLOCK . rest)) y x) 
(BLOCK x . rest) => ((LAMBDA (A B) (B)) 

x 
(LAMBDA () (BLOCK . rest))) 

Let us now consider a rule for the more complicated COND construct: 

(COND) => 'NIL 

(COND (x) . rest) => (OR x (COND . rest)) 

(COND (x . r) . rest) => (IF x (BLOCK . r) (COND . rest)) 

This defines the "extended" COND of modern LISP systems, which produces NIL if no 
clauses succeed, which returns the value of the predicate in the case of a 
singleton clause, and which allows more than one consequent in a clause. An 
important point here is that one can write these rules in terms of other macro 
constructs such as OR and BLOCK; moreover, any extensions to BLOCK, such as the 
limited assignment feature described above, are automatically inherited by COND. 
Thus with the above definition one could write 

(COND ((NUMBERP X) (Y := (SQRT X)) ( + Y (SQRT Y))) 
(T (HACK X))) 

where the scope of the variable Y is the remainder of the first COND clause. 

SCHEME also provides macros for such constructs as DO and PROG, all of 
which expand into similar kinds of code using LAMBDA, IF, and LABELS (see below). 
In particular, PROG permits the use of GO and RETURN in the usual manner. In 
this manner all the traditional imperative constructs are expressed in an 
applicative manner. {Note ASET' Is Imperative) 

None of this is particularly new; theoreticians have modelled imperative 
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constructs in these terms for years. What is new, we think, is the serious 
proposal that a practical interpreter and compiler can be designed for a language 
in which such models serve as the sole definitions of these imperative 
constructs. (Note Dijkstra's Opinion} This approach has both advantages and 
disadvantages. 

One advantage is that the base language is small. A simple-minded 
interpreter or compiler can be written in a few hours. (We have re-implemented 
the SCHEME interpreter from scratch a dozen times or more to test various 
representation strategies; this was practical only because of the small size of 
the language. Similarly, the CHEAPY compiler is fewer than ten pages of code, 
and could be rewritten in a day or less.) Once the basic interpreter is written, 
the macro definitions for all the complex constructs can be used without 
revision. Moreover, the same macro definitions can be used by both interpreter 
and compiler (or by several versions of interpreter and compiler!). Excepting 
the very few primitives such as LAMBDA and IF, it is not necessary to "implement 
a construct twice", once each in interpreter and compiler. 

Another advantage is that new macros are very easy to write (using 
facilities provided in SCHEME). One can easily invent a new kind of DO loop, for 
example, and implement it in SCHEME for both interpreter and all compilers in 
less than five minutes. (In practice such new control constructs, such as the 
ITERATE loop described in [Revised Report], are indeed installed within five to 
ten minutes of conception, in a routine manner.) 

A third advantage is that the attention of the compiler can be focused on 
the basic constructs. Rather than having specialized code for two dozen 
different constructs, the compiler can have much deeper knowledge about each of a 
few basic constructs. One might object that this "deeper knowledge" consists of 
recognizing the two dozen special cases represented by the separate constructs of 
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the former case. This is true to some extent. It is also true, however, that in 
the latter case such deep knowledge will carry over to any new constructs which 
are invented and represented as macros. 

Among the disadvantages of the macro approach are lack of speed and the 
discarding of information. Many people have objected that macros are of 
necessity slower than, say, the FSUBR implementation used by most LISP systems. 
This is true in many current interpretive implementations, but need not be true 
of compilers or more cleverly designed interpreters. Moreover, the FSUBR 
implementation is not general; it is very hard for a user to write a meaningful 
FSUBR and then describe to the compiler the best way to compile it. The macro 
approach handles this difficulty automatically. We do not object to the use of 
the FSUBR mechanism as a special-case -speed hack" to improve the performance of 
an interpreter, but we insist on recognizing the fact that it is not as generally 
useful as the macro approach. 

Another objection relating to speed is that the macros produce convoluted 
code involving the temporary creation and subsequent invocation of many closures. 
We feel, first of all, that the macro writer should concern himself more with 
producing correct code than fast code. Furthermore, convolutedness can be 
eliminated by a few simple optimization techniques in the compiler, to be 
discussed below. Finally, function calls need not be as expensive as is 
popularly supposed. [Steele] 

Information is discarded by macros in the situation, for example, where a 
DO macro expands into a large mess that is not obviously a simple loop; later 
compiler analysis must recover this information. This is indeed a problem. We 
feel that the compiler is probably better off having to recover the information 
anyway, since a deep analysis allows it to catch other loops which the user did 
not use DO to express for one reason or another. Another is the possibility that 
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DO could leave clues around in the form of declarations if desired. 

Another difficulty with the discarding of information is the issuing of 
meaningful diagnostic messages. The user would prefer to see diagnostics mention 
the originally-written source constructs, rather than the constructs into which 
the macros expanded. (An example of this problem from another LISP compiler is 
that it may convert (NEMQ X '(A B C)) into (OR (EQ X 'A) (EQ X 'B) (EQ X 'C)); 
when by the same rule it converts (MEMQ X '(A)) (a piece of code generated by a 
macro) into (OR (EQ X 'A)) f it later issues a warning that an OR had only one 
subform.) This problem can be partially circumvented if the responsibility for 
syntax-checking is placed on the macro definition at each level of expansion. 
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I-i The Imperative Treatment of Applicative Constructs 

Given the characteristics of lexical scoping and tail-recursive 
invocations, it is possible to assign a peculiarly imperative interpretation to 
the applicative constructs of SCHEME, which consists primarily of treating a 
function call as a GOTO. More generally, a function call is a GOTO that can pass 
one or more items to its target; the special case of passing no arguments is 
precisely a GOTO. It is never necessary for a function call to save a return 
address of any kind. It is true that return addresses are generated, but we 
adopt one of two other points of view, depending on context. One is that the 
return address, plus any other data needed to carry on the computation after the 
called function has returned (such as previously computed intermediate values and 
other return addresses) are considered to be packaged up into an additional 
argument (the continuation) which is passed to the target. This lends itself to 
a non-functional interpretation of LAMBDA, and a method of expressing programs 
called the continuation-passing style (similar to the message-passing actors 
paradigm [Hewitt]), to be discussed further below. The other view, more 
intuitive in terms of the traditional stack implementation, is that the return 
address should be pushed before evaluating arguments rather than before calling a 
function. This view leads to a more uniform function-calling discipline, and is 
discussed in [Declarative] and [Steele]. 

We are led by these points of view to consider a compilation strategy in 
which function calling is to be considered very cheap (unlike the situation with 
PL/I and ALGOL, where programmers avoid procedure calls like the plague -- see 
[Steele] for a discussion of this). In this light the code produced by the 
sample macros above does not seem inefficient, or even particularly convoluted. 
Consider the expansion of (OR a b c): 
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((LAMBDA (V R) (IF V V (R))) 
a 

(LAMBDA () ((LAMBDA (V R) (IF V V (R))) 
b 

(LAMBDA () ((LAMBDA (V R) (IF V V (R))) 
c 
(LAMBDA () 'NIL)))))) 

Then we might imagine the following (slightly contrived) compilation scenario. 
First, for expository purposes, we shall rename the variables in order to be able 
to distinguish them. 

((LAMBDA (VI Rl) (IF VI VI (Rl))) 
a 

(LAMBDA () ((LAMBDA (V2 R2) (IF V2 V2 (R2))) 
b 

(LAMBDA () ((LAMBDA (V3 R3) (IF V3 V3 (R3))) 
c 
(LAMBDA () 'NIL)))))) 

We shall assign a generated name to each LAMBDA-expression, which we shall notate 
by writing the name after the word LAMBDA. These names will be used as tags in 
the output code. 

((LAMBDA namel (VI Rl) (IF VI VI (Rl))) 
a 

(LAMBDA name2 () ((LAMBDA name3 (V2 R2) (IF V2 V2 (R2))) 
b 
(LAMBDA name4 () ((LAMBDA name5 (V3 R3) 

(IF V3 V3 (R3))) 
c 
(LAMBDA name6 () 'NIL)))))) 

Next, a simple analysis shows that the variables Rl, R2, and R3 always denote the 
LAMBDA-expressions named name2, name4, and name6, respectively. Now an optimizer 
might simply have substituted these values into the bodies of namel, name3, and 
name 5 using the rule of beta-conversion, but we shall not apply that technique 
here. Instead we shall compile the six functions in a straightforward manner. 
We make use of the additional fact that all six functions are closed in identical 
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environments (we count two environments as identical if they involve the same 
variable bindings, regardless of the number of "frames" involved; that is, the 
environment is the same inside and outside a (LAMBDA () ...)). Assume a simple 
target machine with argument registers called regl, reg2, etc. 

main: <code for a> jresult in regl 

LOAD regZ,[name2] ;[nameZ] is the closure for nameZ 
CALL-FUNCTION Z,[namel] ;call namel with Z arguments 

namel: JUMP-IF-NIL regl,namela 

RETURN ; return the value in regl 

name la: CALL-FUNCTION 0,regZ ;call function in regZ, arguments 

nameZ: <code for b> jresult in regl 

LOAD regZ,[name4] ;[name4] is the closure for name 4 
CALL-FUNCTION Z,[name3] ;call name 3 with Z arguments 

name3: JUMP-IF-NIL regl,name3a 

RETURN ; return the value in regl 

name3a: CALL-FUNCTION 0,regZ ;call function in regZ, arguments 

name4: <code for c> jresult in regl 

LOAD reg2,[name6] ;[name6] is the closure for name 6 
CALL-FUNCTION Z,[name5] jcall name 5 with Z arguments 

name5: JUMP-IF-NIL regl,name5a 

RETURN ; return the value in regl 

name5a: CALL-FUNCTION 0,regZ jcall function in regZ, arguments 

name6: LOAD regl,'NIL jconstant NIL in regl 
RETURN 

Now we make use of our knowledge that certain variables always denote certain 
functions, and convert CALL-FUNCTION of a known function to a simple GOTO. (We 
have actually done things backwards here; in practice this knowledge is used 
before generating any code. We have fudged over this issue here, but will return 
to it later. Our purpose here is merely to demonstrate the treatment of function 
calls as GOTOs. ) 

main: <code for a> jresult in regl 

LOAD regZ,[nameZ] ;[nameZ] is the closure for nameZ 
GOTO namel 
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namel: JUMP-IF-NIL r eg 1, name la 

RETURN ; return the value in regl 

name la: GOTO name 2 

name2: <code for b> ;result in regl 

LOAD reg2,[name4] ;[name4] is the closure for name 4 
GOTO name3 

name 3: JUMP-IF-NIL regl,name3a 

RETURN ; return the value in regl 

name3a: GOTO name 4 

name4: <code for c> ;result in regl 

LOAD reg2,[name6] ;[name6] is the closure for name 6 
GOTO name 5 

name5: JUMP-IF-NIL regl,name5a 

RETURN ; return the value in regl 

name5a: GOTO name 6 

name 6: LOAD regl,'NIL jconstant NIL in regl 
RETURN 

The construction [foo] indicates the creation of a closure for foo in the current 
environment. This will actually require additional instructions, but we shall 
ignore the mechanics of this for now since analysis will remove the need for the 
construction in this case. The fact that the only references to the variables 
Rl, R2, and R3 are function calls can be detected and the unnecessary LOAD 
instructions eliminated. (Once again, this would actually be determined ahead of 
time, and no LOAD instructions would be generated in the first place. All of 
this is determined by a general pre-analysis, rather than a peephole post-pass.) 
Moreover, a GOTO to a tag which immediately follows the GOTO can be eliminated. 
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main: <code for a> jresult in regl 

namel: JUMP-IF-NIL regl.namela 

R ETURN ; return the value in regl 

name la: 

nameZ: <code for b> jresult in regl 

name3: JUMP-IF-NIL regl,name3a 

RETURN ; return the value in regl 

name3a: 

name4: <code for c> jresult in regl 

nameS: JUMP-IF-NIL regl,name5a 

RETURN ; return the value in regl 

name5a: 

nameG: LOAD regl, 'NIL jconstant NIL in regl 

RETURN 

This code is in fact about what one would expect out of an ordinary LISP 
compiler. (There is admittedly room for a little more improvement.) RABBIT 
indeed produces code of essentially this form, by the method of analysis outlined 
here. 

Similar considerations hold for the BLOCK macro. Consider the expression 
(BLOCK a b c); conceptually this should perform a, b, and c sequentially. Let 
us examine the code produced: 

((LAMBDA (A B) (B)) 
a 

(LAMBDA () ((LAMBDA (A B) (B)) 
b 
(LAMBDA () c)))) 



Renaming the variables and assigning names to LAMBDA-express 

((LAMBDA namel (Al Bl) (Bl)) 
a 

(LAMBDA nameZ () ((LAMBDA name3 (A2 B2) (BZ)) 
b 
(LAMBDA name4 () c)))) 

Producing code for the functions: 



ions 
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main: <code for a> 

LOAD reg2,[name2] 
CALL-FUNCTION 2, [namel] 

namel: CALL-FUNCTION 0,reg2 

name2: <code for b> 

LOAD reg2,[name4] 
CALL-FUNCTION 2,[name3] 

name3: CALL-FUNCTION 0.reg2 

name4: <code for c> 
RETURN 

Turning general function calls into direct GO's, on the basis of analysis of what 
variables must refer to constant functions: 

main: <code for a> 

LOAD reg2,[name2] 
GOTO namel 

namel : GOTO name2 

name 2: <code for b> 

LOAD reg2,[name4] 
GOTO name 3 

name 3: GOTO name 4 

name 4: <code for c> 
RETURN 

Eliminating useless GOTO and LOAD instructions: 

main: <code for a> 
namel : 

name2: <code for b> 
name 3: 

name4: <code for c> 
RETURN 

What more could one ask for? 

Notice that this has fallen out of a general strategy involving only an 
approach to compiling function calls , and has involved no special knowledge of OR 
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or BLOCK not encoded in the macro rules. The cases shown so far are actually 
special cases of a more general approach, special in that all the conceptual 
closures involved are closed in the same environment, and called from places that 
have not disturbed that environment, but only used "registers". In the more 
general case, the environments of caller and called function will be different. 
This divides into two subcases, corresponding to whether the closure was created 
by a simple LAMBDA or by a LABELS construction. The latter involves circular 
references, and so is somewhat more complicated; but it is easy to show that in 
the former case the environment of the caller must be that of the (known) called 
function, possibly with additional values added on. This is a consequence of 
lexical scoping. As a result, the function call can be compiled as a GOTO 
preceded by an environment adjustment which consists merely of lopping off some 
leading portion of the current one (intuitively, one simply "pops the unnecessary 
crud off the stack"). LABELS-closed functions also can be treated in this way, 
if one closes all the functions in the same way (which RABBIT presently does, but 
this is not always desirable). If one does, then it is easy to see the effect of 
expanding a PROG into a giant LABELS as outlined in [Imperative] and elsewhere: 
normally, a GOTO to a tag at the same level of PROG will involve no adjustment of 
environment, and so compile into a simple GOTO instruction, whereas a GOTO to a 
tag at an outer level of PROG probably will involve adjusting the environment 
from that of the inner PROG to that of the outer. All of this falls out of the 
proper imperative treatment of function calls. 
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8. Compi lation Strategy 

The overall approach RABBIT takes to the compilation of SCHEME code may 
be summarized as follows: 

(1) Alpha-conversion (renaming of variables) and macro-expansion 
(expansion of syntactic rewrite rules). 

(2) Preliminary analysis (variable references, "trivial" expressions, and 
side effects). 

(3) Optimization (meta-evaluation) . 

(4) Conversion to continuation-passing style. 

(5) Environment and closure analysis. 

(6) Code generation. 

During (1) a data structure is built which is structurally a copy of the user 
program but in which all variables have been renamed and in which at each "node" 
of the program tree are additional slots for extra information. These slots are 
filled in during (2). In (3) the topology of the structure may be modified to 
reflect transformations made to the program; routines from (2) may be called to 
update the information slots. In (4) a new data structure is contructed from the 
old one, radically different in structure, but nevertheless also tree-like in 
form. During (5) information is added to slots in the second structure. In (6) 
this information is used to produce the final code. 
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A. Alpha-conversion and macro-expansion 

In this phase a copy of the user program is made. The user program is 
conceptually a tree structure; each node is one of several kinds of construct 
(constant, variable, LAMBDA-expression, IF-expression, combination, etc.). Some 
kinds of nodes have subnodes; for example, a LAMBDA-expression node has a 
subnode representing the body, and a combination node has a subnode for each 
argument. The copying is performed in the obvious way by a recursive tree-walk. 
In the process all bound variables are renamed. Each bound variable is assigned 
a new generated name at the point of binding, and each node for a reference to a 
bound variable contains this generated name, not the original name. From this 
point on all variables are dealt with in terms of their new names. (This is 
possible because, as a consequence of lexical scoping, we can identify all 
references to each bound variable.) These new names are represented as atomic 
symbols, and the property lists of these symbols will later be used to store 
information about the variables. 

As each subform of the user program is examined, a check is made for a 
macro call, which is a list whose car is an atomic symbol with one of several 
macro-defining properties. When such a call is encountered, the macro call is 
expanded, and the tree-walk is resumed on the code returned by the expansion 
process. 
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B. Preliminary analysis 

The preliminary analysis ("phase 1") is in three passes, each involving a 
tree-walk of the node structure, filling in information slots at each node. (Two 
passes would have sufficed, but for reasons of clarity and modularity there is 
one pass for each type of analysis.) 

The first pass (ENV-ANALYZE) analyzes variable references. For each node 
we determine the set of all local (bound) variables referenced at or below that 
node. For example, for a variable-reference node this set is empty (for a global 
variable), or the singleton set of the variable itself (for a local variable); 
for a LAMBDA-expression, it is the set for its body minus the variables bound by 
that LAMBDA-expression; for an IF-expression, it is the union of the sets for 
the predicate, consequent, and alternative; and so on. We also compute for each 
node the set of bound variables which appear in an ASET' at or below the node. 
(This set will be a subset of the first set, but no non-trivial use of this 
property is used in this pass.) Finally, for each variable we store several 
properties on its property list, including a list of all nodes which reference 
the variable (for "reading") and a list of of all ASET' nodes which modify the 
variable. These lists are built incrementally, with an entry added as each 
reference is encountered during the tree walk. (This exemplifies the general 
strategy for passing data around; any information which cannot be passed 
conveniently up and down the tree, but which must jump laterally between 
branches, is accumulated on the property lists of the variables. It may appear 
to be "lucky" that all such information has to do with variables, but this is 
actually an extremely deep property of our notation. The entire point of using 
identifiers is to relate textually separated constructions. We depend on alpha- 
conversion to give all variables distinct names (by "names" we really mean 
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"compile-time data structures") so that the information for variables which the 
user happened to give the same name will not be confused.) 

The second pass (TRIV-ANALYZE) locates "trivial" portions of the program. 
(Cf. [Wand and Friedman].) Constants and variables are trivial; an IF- 
expressions is trivial iff the predicate, consequent, and alternative are all 
trivial; an ASET' is trivial iff its body is trivial; a combination is trivial 
iff the function is either a global variable which is the name of a MacLISP 
primitive, or a LAMBDA-expression whose body is trivial, and the arguments are 
all trivial. LAMBDA-expressions, LABELS-forms (which contain LAMBDA- 
expressions), and CATCH-forms are never trivial. The idea is that a trivial 
expression is one that MacLISP could evaluate itself, without benefit of SCHEME 
control structures. (No denigration of MacLISP's ability is intended by this 
terminology!) Note particularly the two special cases of combinations 
distinguished here (in which the function position contains either the name of a 
MacLISP primitive or a LAMBDA-expression); they are very important, and shall be 
referred to respectively as TRIVFN-combinations and LAMBDA-combinations. 

The third pass (EFFS-ANALYZE) analyzes the possible side-effects caused 
by each node, and the side-effects which could affect it. It actually produces 
two sets of analyses, one liberal and one conservative. Where there is any 
uncertainty as to what side-effects may be involved, it assumes none in one case 
and all possible in the other. The liberal estimation is used only to issue 
error messages to the user about possible conflicts which might result as a 
consequence of the freedom to evaluate arguments to combinations in any order. 
The user is given the benefit of doubt, and warned only of a "provable" conflict. 
(Actually, the "proof" is a little sloppy, and can err in both directions, but in 
practice it has issued no false alarms and a number of helpful warnings.) The 
conservative estimation is used by the optimizer, which will move expressions 



48 

only if it can prove that there will be no conflict. 

Side effects are grouped into classes: ASET, RPLACA and RPLACD (which 
are considered distinct), FILE (input/output operations), and CONS. These are 
not intended to be exhaustive; there is also an internal notation for "any side- 
effect whatever". The use of classes enables the analysis to realize, for 
example, that RPLACA cannot affect the value of a variable per se . There is a 
moderately large body of data in RABBIT about the side-effects of NacLISP 
primitive functions. For example, CAR, CDR, CAAR, CADR, and so on are known not 
to have side-effects, and to be respectively affected only by RPLACA, RPLACD, 
RPLACA, RPLACA or RPLACD, and so on. Similarly, RABBIT knows that ASET' affects 
the values of variables, but cannot affect the outcome of a CAR operation. (It 
may affect the value of the expression (CAR X), but only because a variable 
reference is a subnode of the combination. The effects, or affectability, of a 
combination are the union of the effects, or affectibility, of all arguments plus 
those of the function.) The CONS side-effect is a special case. This side- 
effect cannot affect anything, and two instances of it may be performed in the 
"wrong" order, but performing a single instance twice will produce distinct (as 
determined by EQ) and therefore incorrect results. In particular, closures of 
LAMBDA-expressions involve the CONS side-effect. (The definition of SCHEME says 
nothing about whether EQ is a valid operation on closures, but in general it is 
not a good idea to produce unnecessary multiple copies.) On the other hand, 
LAMBDA-expressions occurring in function position of a LAMBDA-combination do not 
incur the CONS side-effect. The CONS side-effect is given special treatment in 
the optimizer. {Note Side-Effect Classifications} 
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C. Optimization 

Once the preliminary analysis is done, the optimization phase performs 
certain transformations on the code. The result is an equivalent program which 
will (probably) compile into more efficient code. This new program is itself 
structurally a valid SCHEME program; that is, all transformations are contained 
within the language. The transformations are thus similar to those performed on 
macro calls, consisting of a syntactic rewriting of an expression, except that 
the situations where such transformations are applicable are more easily 
recognized in the case of macro calls. It should be clear that the optimizer and 
the macro-functions are conceptually at the same level in that they may be 
written in the same meta-language that operates on representations of SCHEME 
programs. {Note Non-deterministic Optimization} 

The simplest transformation is that a combination whose function position 
contains a LAMBDA-expression and which has no arguments can be replaced by the 
body of the LAMBDA-expression: 

((LAMBDA () body)) => body 

Another is that, in the case of a LAMBDA- combination, if some parameter of the 
LAMBDA-expression is not referenced and the corresponding argument can be proved 
to have no side-effects (with an exception discussed below), then the parameter 
and argument can be eliminated: 

((LAMBDA (xl x2 x3) body) al a2 a3) 

=> ((LAMBDA (xl x3) body) al a3) 
if xZ is unreferenced in body and a2 has no side-effects 

Repeated applications of this rule can lead to the preceding case. 

A third rule is that, in a LAMBDA-combination, an argument can be 
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substituted for one or more occurrences of a parameter in the body of the LAMBDA- 
expression. (This rule is related to the view of LAMBDA as a renaming operator 
discussed in [Declarative], and together with the two preceding rules make up the 
rule of beta-conversion.) Such a substitution is permissible only if (a) either 
the parameter is referred to only once or the argument has no side effects, and 
(b) the substitution will not alter the order in which expressions are evaluated 
in such a way as to allow possible side-effects to produce different results. 
Before performing the substitution it is necessary to show that side-effects will 
not interfere in this manner. This issue is discussed in [Allen and Cocke], 
[Geschke], and [Wulf], and characterized more accurately in [Standish]. There is 
also some difficulty if the parameter appears in an ASET'. Presently RABBIT does 
not attempt any form of substitution for such a parameter. (ASET' is so seldom 
used in SCHEME programs that this restriction makes very little difference.) 

This third rule creates an exception to the second. If an argument with 
a side effect is referred to once, and is substituted for the reference, then the 
second rule must be invoked to eliminate the original occurrence of the argument, 
so that the side effect will not occur twice. This requires a little collusion 
between the two rules. 

Even if such a substitution is permissible, it is not always desirable; 
time/space tradeoffs are involved. The current heuristic is that a substitution 
is desirable if (1) the parameter is referred to only once; or (2) the argument 
to be substituted in is a constant or variable; or (3) the argument is a LAMBDA- 
expression whose body is (3a) a constant, or (3b) a variable reference, or (3c) a 
combination which has no more arguments than the LAMBDA-expression requires and 
for which the arguments are all constants or variables. This heuristic was 
designed to be as conservative as possible while handling most cases which arise 
from typical macro-expansions. 
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The case where the expression substituted for a variable is a LAMBDA- 
expression constitutes an instance of procedure integration [Allen and Cocke]. 
The more general kind of procedure integration proposed in [Declarative], which 
would involve block compilation of several user functions, and possibly also user 
declarations or data type analysis, has not been implemented yet. 

It would be possible to substitute a LAMBDA-expression for a variable 
reference in the case of a variable bound by a LABELS. This might be useful in 
the case of a LABELS produced by a simple-minded PROG macro, which produced a 
labelled function for each statement of the PROG; in such a case most labelled 
functions would be referred to only once. We have not implemented this yet in 
RABBIT. {Note Loop Unrolling} 

Currently there is not any attempt to perform the inverse of beta- 
conversion. This process would be that of locating common subexpressions of some 
single large expression, making that large expression the body of a LAMBDA- 
expression of one parameter, replacing all occurrences of the common 
subexpression by a reference to the parameter, and replacing the large expression 
by a combination whose function position contained the LAMBDA-expression and 
whose argument was a copy of the common subexpression. More generally, several 
common subexpressions could be isolated at once and made into several parameters 
of the LAMBDA-expression. For example, consider: 

(LAMBDA (A B C) 

(LIST (/ ( + (- B) (SQRT (- (* B 2) (* 4 A C)))) 
(* 2 A) ) 

(/ (- (- B) (SQRT (- rB2)(MA C)))) 
(*2A)))) 

Within the large expression (LIST ...) we might detect the common subexpressions 
<- B), (SQRT ...), and (* 2 A). Thus we would invent three parameters Ql, Q2, Q3 
and transform the expression into: 
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(LAMBDA (ABC) 

((LAMBDA (Ql Q2 Q3) 

(LIST (/ (+ Ql Q2) Q3) 

(/ (- Ql Q2) Q3))) 
(- B) 

(SQRT (- (~ B 2) (* 4 A C))) 
(* 2 A))) 

(There would be no problem of conflicting names as there is for macro rules, 

because we are operating on code for which all variables have already been 

renamed; Ql, Q2, and Q3 can be chosen as the next variables in the renaming 
sequence. ) 

This approach doesn't always work if side-effects are present; the 
abstracted (!) common subexpression may be evaluated too soon, or the wrong 
number of times. This can be solved by wrapping (LAMBDA () ®) around the common 
subexpression, and replacing references by a combination instead of a simple 
variable reference. For example: 

(IF (HAIRYP X) 

(BLOCK (PRINT • |Here is some hair:|) 

(PRINT X) 

(PRINT '| End of hair.|)) 
(BLOCK (PRINT ' |This one is bald:|) 

(PRINT X) 

(PRINT • |End of baldness. | ))) 

We could not transform it into this: 

((LAMBDA (Ql) 

(IF (HAIRYP X) 

(BLOCK (PRINT ' |Here is some hair:|) 
Ql 

(PRINT '|End of hair.|)) 
(BLOCK (PRINT ■ |This one is bald:|) 
Ql 

(PRINT '|End of baldness. | )))) 
(PRINT X)) 

because x would be printed before the appropriate leading message. Instead, we 
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transform it into: 



((LAMBDA (Ql) 

(IF (HAIRYP X) 

(BLOCK (PRINT '|Here is some hair:|) 
(Ql) 

(PRINT '| End of hair.|)) 
(BLOCK (PRINT '|This one is bald:|) 
(Ql) 

(PRINT '|End of baldness*!)))) 
(LAMBDA () (PRINT X))) 



This is similar to the call-by-name trick used in writing macro rules. 

A more general transformation would detect nearly common subexpressions 
as follows: 

((LAMBDA (Ql) 

(IF (HAIRYP X) 

(Ql • | Here is some hair:| 

' |End of hair. |) 
(Ql '| This one is bald:| 
'I End of baldness.l))) 
(LAMBDA (QZ Q3) 

(BLOCK (PRINT Q2) (PRINT X) (PRINT Q3)))) 

In this way we can express the notion of subroutinization. 
(Note Subroutinization} 

We point out these possibilities despite the fact that they have not been 
implemented in RABBIT because the problem of isolating common subexpressions 
seems not to have been expressed in quite this way in the literature on 
compilation strategies. We might speculate that this is because most compilers 
which use complex optimization strategies have been for ALGOL-like languages 
which do not treat functions as full-fledged data objects, or even permit the 
writing of "anonymous" functions in functions calls as LISP does. 

RABBIT does perform folding on constant expressions [Allen and Cocke]; 
that is, any combination whose function is a side-effect-less MacLISP primitive 
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and whose arguments are all constants is replaced by the result of applying the 
primitive to the arguments. There is presently no attempt to do the same thing 
for side-effect-less SCHEME functions, although this is conceptually no problem. 
Finally, there are two transformations on IF expressions. One is simply 
that an IF expression with a constant predicate is simplified to its consequent 
or alternative (resulting in elimination of dead code). The other was adapted 
from [Standish], which does not have this precise transformation listed, but 
gives a more general rule. In its original form this transformation is: 

(IF (IF a b c) d e) => (IF a (IF b d e) (IF c d e)) 

One problem with this is that the code for d and e is duplicated. This can be 
avoided by the use of LAMBDA-expressions: 

((LAMBDA (Ql Q2) 
(IF a 

(IF b (Ql) (Q2)) 
(IF c (Ql) (Q2)))) 
(LAMBDA () d) 
(LAMBDA () e)) 

As before, there is no problem of name conflicts with Ql and Q2. While this code 
may appear unnecessarily complex, the calls to the functions Ql and Q2 will 
typically, as shown above, be compiled as simple GOTO's. As an example, consider 
the expression: 

(IF (AND PRED1 PRED2) (PRINT 'WIN) (ERROR 'LOSE)) 
Expansion of the AND macro will result in: 

(IF ((LAMBDA (V R) (IF V (R) 'NIL)) 

PRED1 

(LAMBDA () PRED2)) 
(PRINT 'WIN) 
(ERROR 'LOSE)) 
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(For expository clarity we will not bother to rename all the variables, inasmuch 
as they are already distinct.) Because V and R have only one reference apiece 
(and there are no possible interfering side-effects), the corresponding arguments 
can be substituted for them. 

(IF ((LAMBDA (V R) (IF PRED1 ((LAMBDA () PRED2)) 'NIL)) 

PRED1 

(LAMBDA () PREDZ)) 
(PRINT 'WIN) 
(ERROR 'LOSE)) 

Now, since V and R have no referents at all, they and the corresponding arguments 
can be eliminated, since the arguments have no side-effects. 

(IF ((LAMBDA () (IF PRED1 ((LAMBDA () PRED2)) 'NIL))) 
(PRINT 'WIN) 
(ERROR 'LOSE)) 

Next, the combination ((LAMBDA ()...)) is eliminated in two places: 

(IF (IF PRED1 PRED2 'NIL) 
(PRINT 'WIN) 
(ERROR 'LOSE)) 

Now, the transformation on the nested IF's: 

((LAMBDA (Ql Q2) 

(IF PRED1 

(IF PRED2 (Ql) (Q2)) 
(IF 'NIL (Ql) (Q2)))) 
(LAMBDA () (PRINT 'WIN)) 
(LAMBDA () (ERROR 'LOSE))) 

Now one IF has a constant predicate and can be simplified: 
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((LANBDA (Ql Q2) 

(IF PRED1 

(IF PRED2 (Ql) (Q2)) 
(Q2))) 
(LAMBDA () (PRINT 'WIN)) 
(LAMBDA () (ERROR 'LOSE))) 

The variable Ql has only one referent, and so we substitute in, eliminate the 
variable and argument, and collapse a ((LAMBDA () ..)): 

((LAMBDA (Q2) 

(IF PRED1 

(IF PRED2 (PRINT 'WIN) (Q2)) 
(Q2))) 
(LAMBDA () (ERROR 'LOSE))) 

Recalling that (Q2) is, in effect, a GOTO branching to the common piece of code, 
and that by virtue of later analysis no actual closure will be created for either 
LANBDA-expression, this result is quite reasonable. {Note Evaluation for 
Control} 



D. Conversion to Continuation-Passing Style 

This phase is the real meat of the compilation process. It is of 
interest primarily in that it transforms a program written in SCHEME into an 
equivalent program (the continuation-passing-style version , or CPS version ), 
written in a language isomorphic to a subset of SCHEME with the property that 
interpreting it requires no control stack or other unbounded temporary storage 
and no decis ions as to the order of evaluation of (non-trivial) subexpressions . 
The importance of these properties cannot be overemphasized. The fact that it is 
essentially a subset of SCHEME implies that its semantics are as clean, elegant, 
and well-understood as those of the original language. It is easy to build an 
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interpreter for this subset, given the existence of a SCHEME interpreter, which 
can execute the transformed program directly at this level. This cannot be said 
for other traditional intermediate compilation forms; building an interpreter 
for triples [Gries], for example, would be a tremendous undertaking. The 
continuation-passing version expresses all temporary intermediate results 
explicitly as variables appearing in the program text, and all temporary control 
structure in the form of LANBDA-expressions (that is, closures). It is explicit 
in directing the order of operations; there is no non-trivial freedom at any 
point in the evaluation process. 

As a result, once the CPS version of a program has been generated, the 
remainder of the compilation process is fairly easy. There is a reasonably 
direct correspondence between constructs in the CPS language and "machine- 
language" operations (if one assumes CONS to be a "machine-language" primitive 
for augmenting environment structure, which we do). The later passes are 
complicated only by the desire to handle certain special cases in an optimal 
manner, most particularly the case of a function call whose function position 
contains a variable which can be determined to refer to a known LAMBDA- 
expression. This analysis must be done after the CPS conversion because it 
applies to continuations as well as LAMBDA-expressions written by the user or 
generated by macros. 

The CPS language differs from SCHEME in only two respects. First, each 
primitive function is different, in that it returns no value; instead, it 
accepts an additional argument, the continuation , which must be a "function" of 
one argument, and by definition invokes the continuation tail-recursively, giving 
it as an argument the computed "value" of the primitive function. We extend this 
by convention to non-primitive functions, and so all functions are considered to 
take a continuation as one of its arguments (by convention the first -- this 
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differs from the convention used in the examples in [SCHEME], [Imperative], and 
[Declarative]). Continuations, however, do not themselves take continuations as 
arguments. 

Second, no combination may have a non-trivial argument. In strict 
continuation-passing style (as described in note {Evalorder} of [Imperative]), 
this implies that no combination can have another combination as an argument, or 
an IF-expression with a non-trivial consequent or alternative, etc. We relax 
this to allow as arguments any trivial form in the sense described above for the 
preliminary triviality analysis. We note that, in principle, trivial expressions 
require no unbounded space on the part of the SCHEME interpreter to evaluate, and 
that the compiler need not worry about control and environment issues for trivial 
expressions. (Trivial expressions do require unbounded space on the part of the 
MacLISP run-time system, because the point of the triviality analysis is that 
trivial expressions can be handled by MacLISP! The question of what should be 
considered trivial is actually a function of the characteristics of our target 
machine. We note that, at the least, variables, constants, and LAMBDA- 
expressions should be considered trivial. That the preliminary triviality 
analysis does not consider LAMBDA-expressions trivial is a trick so that all 
closures will be processed by the CPS-conversion process, and the fact that we 
call it a triviality analysis is a white lie. See, however, 

[Wand and Friedman].) 

The effect of the restriction on combinations is startling. On the one 
hand, they do not so constrain the language as to be useless; on the other hand, 
they require a radically different approach to the expression of algorithms. It 
is easy to see that no control stack is necessary to evaluate such code, for, as 
mentioned in [SCHEME], control stack is used only to keep track of intermediate 
values and return addresses, and these arise only in the case of combinations 
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with non-trivial arguments, and conditionals with non-trivial predicates. 

An algorithm for converting SCHEME programs to continuation-passing style 
was given in Appendix A of [Declarative]. {Note Old CPS Algorithm) The one used 
in RABBIT is almost identical, except that for the convenience of the code- 
generation phase a distinction is made between ordinary LAMBDA-expressions and 
continuations, and between combinations used to invoke "functions" and those used 
to invoke continuations. These sets can in fact be consistently distinguished, 
and it affords a certain amount of error-checking; for example, a LAMBDA- 
expression should never appear in the "function" position of a continuation- 
invoking combination. Another fine point is that ASET' can never be applied to a 
variable bound by a continuation. Except for such differences arising from their 
uses, the two sets of constructs are treated more or less identically in later 
phases. An additional difference between the algorithm in [Declarative] and the 
one in RABBIT is that trivial subforms are treated as single nodes in the CPS 
version; these nodes have pointers to the non-CPS versions of the relevant code, 
which are largely ignored by later processing until the final code is to be 
generated. 

It must be emphasized that there is not necessarily a unique CPS version 
for a given SCHEME program; there is as much freedom as there is in the original 
program to re-order the evaluation of subexpressions. In the course of the 
conversion process decisions must be made as to what order to use in evaluating 
arguments to combinations. The current decision procedure is fairly simple- 
minded, consisting mostly of not making copies of constants and the values of 
variables. The point here, as earlier, is not so much that RABBIT has a much 
better algorithm than other compilers as that it has a far cleaner way of 
expressing the result. (For a complex decision procedure for argument ordering, 
see [Wulf].) {Note Non-deterministic CPS Conversion) 
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E. Environment and closure analysis 

This phase consists of four passes over the CPS version of the program. 
As with the earlier preliminary analysis, each pass determines one related set of 
information and attaches this information to nodes of the program tree and to 
property lists. 

The first pass (CENV-ANALYZE) analyzes variable references for the CPS 
version in a manner similar to that of the first pass of the preliminary 
analysis. The results of this previous analysis are used here in the case of 
trivial expressions; with this exception the analysis is redone completely, 
because additional variables are introduced by the CPS conversion. (None of 
these new variables can appear in an ASET\ however, and so the analysis of 
written variables need not be done over.) In addition, for each variable 
reference which does not occur in the function position of a combination, we mark 
that variable with a non-nil VARIABLE-REFP property, used later to determine 
whether closures need to be created for known functions. 

The second pass (BIND-ANALYZE) determines for each LAMBDA-expression 
whether a closure will be needed for it at run time. There are three 
possibilities: 

(1) If the function denoted by the LAMBDA-expression is bound to some 
variable, and that variable is referenced other than in function position, 
then the closure is being treated as data, and must be a full (standard CBETA 
format) closure. If the function itself occurs in non-function position other 
than in a LAMBDA-combination, it must be fully closed. 

(2) If the closure is bound to some variable, and that variable is referenced 



61 



only in function position, but some of these references occur within other 
partially or fully closed functions, then this function must be partially 
closed. By this we mean that the environment for the closure must be "consed 
up", but no pointer to the code need be added on as for a full closure. This 
function will always be called from places that know the name of the function 
and so can just perform a GO to the code, but those such places which are 
within closures must have a complete copy of the necessary environment. 
(3) In other cases (functions bound to variables referenced only in function 
position and never within a closed function, or functions occurring in 
function position of LAMBDA-combinations), the function need not be closed. 
This is because the environment can always be fully recovered from the 
environment at the point of call. 

In order to determine this information, it is necessary to determine, for 
each node, the set of variables referred to from within closed functions at or 
below that node. Thus this process and the process of determining which 
functions to close are highly interdependent, and so must be accomplished in a 
single pass. 

The second pass also generates a name for each LAMBDA-expression (to be 
used as tags in the output code, as discussed in the examples earlier), and for 
non-closed functions determines which variables will be assigned to "registers" 
or "memory locations". For these non-closed functions it may determine that 
certain variables need not be assigned locations at all (they are never 
referenced, or are bound to other non-closed functions - the latter circumstance 
is important when a variable is known to denote a certain function, but the 
optimizer was too conservative to perform beta-substitution for fear of 
duplicating code and thus wasting space). Finally, for each variable which is 
(logically, at run time not necessarily actually) bound to a known function (and 
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which never appears in an ASET'), a property KNOWN-FUNCTION is put on its 
property list whose value is the node of the CPS version of that function. This 
property is used later in generating code for combinations in whose function 
positions such variables appear. 

The third pass (DEPTH-ANALYZE) examines each LAMBDA-expression and 
determines the precise registers or memory locations through which arguments are 
to be passed to each. Closed functions take their arguments in the standard 
registers described earlier; non-closed functions may take their arguments in 
any desired places. (Partially closed functions could also, but there is little 
advantage to this.) The allocation strategy in RABBIT for non-closed functions 
is presently merely stack-like; the deeper the nesting of a function, the higher 
in the ordering of "registers" and "memory locations" are the locations assigned. 
(See e.g. [Johnsson] for a detailed analysis of the register allocation problem.) 

The fourth pass (CLOSE-ANALYZE) determines the precise format of the 
environment to be constructed for each closure. That is, while the third pass 
handles cases for which stack-allocation of environments will suffice, the fourth 
pass deals with heap-allocated environment structures. Recall that the format of 
an environment can be completely arbitrary, since the only code which can 
possibly refer to an environment is the function for a closure of which the 
environment was created. Therefore the compiler which compiles that function has 
a free hand in determining the structure of the environment. For the sake of 
simplicity, RABBIT chooses to generate code which represents environments simply 
as a list of variable values. Several environment lists may share a common tail. 
The environment for a closure need not contain any variables not needed by the 
closed function, but it may if this will allow the sharing of a single structure 
among several closures. (There is a problem with variables modified by ASET' 
which is discussed in the next paragraph.) 
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For each LAMBDA-expression which must be closed, three sets of variables 
are computed: (1) the variables which will already be in the "consed" 
environment structure at the time the closure is to be created; (2) additional 
variables which must be added ("consed on") to the existing structure to create 
the closure (because at that point they are spread out in "registers") {Note 
Heap-Allocated Contours}; (3) variables which must be added to the environment 
immediately after entering the function because they must eventually be added in 
for closures later and they are referred to in ASET' constructs. The third set 
arises from a requirement that ASET' constructs must have a consistent effect, 
and confusion can arise if a variable's value can be in more than one place. If 
the value were allowed to be both in a "register" and in an environment 
structure, or in several different environment structures, then altering the 
value in one place would not affect the other places. To assure consistency, 
this third set is computed, and such variables must at run time be placed in an 
environment structure to be shared by all others which refer to such variables. 

For every LABELS statement a set of variables is computed which is the 
set of variables to be added to the existing environment on entry to the LABELS 
body, in order to share this new structure among all the closures to be created 
for the LABELS functions. 
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F. Code generation 

Given the foregoing analysis, the generation of code is straightforward, 
and largely consists of using the information already gathered to detect special 
cases. The special cases of interest relate almost entirely to function calls 
and closures (indeed, there is little else in the language for RABBIT'S 
purposes! ) . 

RABBIT has provision for "block compiling" a number of functions into a 
single module. This permits an optimization in which one function can transfer 
control directly to another without going through the "UUO handler". Even if 
several user functions are not compiled into a single module, this is still of 
advantage, because a single user function can produce a large number of output 
functions, as a consequence of the code-generation techniques. 

A module consists of a single MacLISP function whose body is a single 
PROG. This PROG has no local variables, but does have a number of tags, one for 
each function in the module. On entry to the module, the register **ENV** will 
contain the "environment" for the function to be executed. As noted above, the 
format of this is arbitrary. For functions compiled by RABBIT, this is a list 
whose car is a tag within the PROG and whose cdr is the "real environment". 
{Note Code Pointers} At the beginning of the PROG there is always the code 

(GO (PR0G2 NIL (CAR **ENV**) 

(SETQ **ENV** (CDR **ENV**)))) 

the effect of which is to put the "real environment" in **ENV** and then perform 
a computed GO to the appropriate tag. (This is the only circumstance in which 
the MacLISP PR0G2 and computed GO constructs are used by RABBIT-compiled code. 
Either could be eliminated at the expense of more bookkeeping, the former by 
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using a temporary intermediate variable, the latter by using a giant COND with 
non-computed GO statements (which is effectively how the MacLISP compiler 
compiles a computed GO anyway). As always, such trivial issues are left to the 
MacLISP compiler when they do not bear on the issues of interest in compiling 
SCHEME code.) For small functions, often the "main entry point- is the only 
closed function, and it would be possible to eliminate the computed GO, but 
RABBIT always outputs one, because is is cheap and provides a useful error check. 
Once the computed GO has been performed, the code following the tag is 
responsible for performing its bit of computation and then exiting. It may exit 
by setting the **FUN** register to another function, setting up appropriate 
argument registers, and then doing (RETURN NIL) to exit the module and enter the 
UUO handler; or it may exit by directly transferring control to another function 
within the module by performing a GO to the appropriate tag, after setting up the 
arguments and **ENV**. In the latter case the arguments may actually be passed 
through "memory locations" rather than the standard "registers". (Conceptually, 
in this optimized case the environment needed for the function being called is 
being passed, not in **ENV**, but spread out in those registers and locations 
lower than those being used to pass the arguments.) 

Starting with the CPS version of one or more user functions, the 
generation of the code for a module proceeds iteratively. Code for each function 
is generated in turn, producing one segment of code and a tag; this tag and code 
will become part of the body of the module. In processing a function, other 
functions may be encountered; in general, each such function is added to the 
list of outstanding functions for the module, and is replaced by code to generate 
a closure for that function. When all functions have been processed, the outer 
structure of the module is created. 

Many situations are treated specially. For example, 
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((LAMBDA ...) ...) 

does not cause the LAMBDA-expression to be added to the list of outstanding 
functions; rather, a MacLISP PROGN is constructed consisting of the argument 
set-up followed by the code for the body of the LAMBDA- express ion. A more subtle 
case is 

(FOO (LAMBDA ...) ...) 

where FOO is the name of a MacLISP primitive and the LAMBDA-expression is the 
continuation. In this case a PROGN is constructed consisting of calling the 
MacLISP primitive on the other arguments, putting this value into the appropriate 
location, and then executing the body of the LAMBDA-expression. (It should be 
noted that all these special cases must be anticipated by the analysis preceding 
the code generation phase.) 

In the case of ((LAMBDA ...) ...), we must also handle the argument set- 
up a little carefully, because parameters which are never referred to or which 
represent known non-closed functions need not actually be passed. However, the 
corresponding argument for the first case must nevertheless be evaluated because 
it may have a side-effect. A good example is the result of expanding BLOCK 
(neglecting the effects of optimization): there is a (continuation-passing 
style) combination of the form: 

((LAMBDA (C A B) (B C)) cont x (LAMBDA (C) y)) 

The argument x need not be passed, but presumably has a side effect and so must 
be evaluated. The second LAMBDA-expression need not be closed, and so requires 
neither evaluation nor passing. The output code uses a PROGN to evaluate the 
arguments which are potentially for effect. In this way the end result of a 
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BLOCK construct actually turns out to be a MacLISP PROGN. (The routine 
LAMBDACATE in the Appendix is responsible for this analysis.) 
{Note Evaluation for Effect} 

Another case of interest is a combination whose function position 
contains a variable with a KNOWN-FUNCTION property. The value of this property 
is the node for the CPS version of the function, which provides information about 
pos ..* ^ code generation strategies. We can decide which arguments needn't be 
passed as for the ((LAMBDA ...) ...) case, and can also arrange to call the 
function with a direct (MacLISP) GO to the appropriate tag within the module. 
The set-up of the environment depends on whether the function is non-closed or 
partially closed; in the latter case the partial closure is the environment, and 
in the former the environment can be recovered from the current one (and may even 
be the same). 

A certain amount of "peephole optimization" [McKeeman] is also performed, 
primarily to make it easier for people to inspect the code produced, since the 
MacLISP compiler will handle them anyway. Examples of these are avoiding the 
generation of SETQ of a variable to the value of that same variable; reduction 
of car-cdr chains to single functions, such as (CAR (CDR (CDR x))) to (CADDR x) ; 
removal of nested PROGN's such as 

(PROGN a (PROGN b c) d) => (PROGN abed) 

and the like; and simplification of nested COND's, such as 

(COND (a b) (COND (a b) 

(T (COND (c d) => ( C d) 
...))) ...) 

One of the effects of this last peephole optimization is that many times, when 
the user writes a COND in a piece of SCHEME code, that COND is expanded into IF 
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constructs, and then re-contracted by the peephole optimization into an 
equivalent COND! (This fact is of no practical consequence, but looks cute.) 
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9. Example: Compilation of Iterative Factorial 

Here we shall provide a complete example of the compilation of a simple 
function IFACT (iterative factorial), to show what quantities are computed in the 
course of analyzing the code. We shall need some notation for the data 
structures involved. Every node of the program is represented by a small data 
structure which has a type and several named components. (In the actual 
implementation, a node is represented as two such structures; one contains named 
components common to all program nodes, and the other contains components 
specific to a given node type. We shall gloss over this detail here.) For 
example, a LANBDA-expression is represented by a structure of type LAMBDA with 
components named UVARS (user variable names), VARS (the alpha-converted names), 
BODY (the node representing the body), ENV (the environment of the node), and so 
on. We shall represent a data structure as the name of its type, with the 
components written below it and indented, with colons after each component name. 
For example: 

LAMBDA 

UVARS: (A B) 

VARS: (VAR-43 VAR-44) 

BODY: COMBINATION 

ARGS: VARIABLE 

VAR : F 
VARIABLE 

VAR: VAR-44 
VARIABLE 

VAR: VAR-43 

Notice that the value of a component may itself be a structure. These structures 
are always arranged in a tree, so no notation for cycles will be needed. In the 
case where a component contains a list of things, we will write the things as a 
LISP list unless the things are structures, in which case we will simply write 
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them in a vertical stack, as shown in the example above. To conserve space, in 
any single diagram we will show only the named components of interest. 
Components may seem to appear and then disappear in the series of diagrams, but 
in practice they all exist simultaneously. 
The source code for our example: 

(DEFINE IFACT 

(LAMBDA (N) 

(LABELS ((F (LAMBDA (M A) 

(IF (= M 0) A 

(F (- M 1) (* M A)))))) 
(F N 1)))) 

The alpha-conversion process copies the program and produces a tree of 
structures. All the bound variables are renamed, and VARIABLE nodes refer to 
these new names. The GLOBALP component in a VARIABLE node is non-NIL iff the 
reference is to a global variable. The ENV component is simply an a-list 
relating the user names of variables to the new names; this a-list is computed 
during the conversion as the new names are created at LAMBDA, LABELS, and CATCH 
nodes. 



LAMBDA 

ENV: () 
UVARS : ( N ) 
VARS: (VAR-1) 
BODY: 

LABELS 

ENV: ((N VAR-1)) 
UFNVARS: (F) 
FNVARS: (FNVAR-2) 
FNDEFS: 

LAMBDA 

ENV: ((F FNVAR-2) (N VAR-1)) 

UVARS: (M A) 

VARS: (VAR-3 VAR-4) 

BODY: IF 

ENV: ((A VAR-4) (M VAR-3) (F FNVAR-2) (N VAR-1)) 
PRED: COMBINATION 

ENV: *** (see below) 
ARGS: VARIABLE 
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CON: 



ALT: 





ENV: 


*** 






VAR: 


s 






GLOBALP 


': T 






VARIABLE 








ENV: 


*** 






VAR: 


VAR -3 






GLOBALP 


: NIL 






CONSTANT 








ENV: 


*** 






VALUE : 







VARIABLE 








ENV: 


*** 






VAR: 


VAR -4 






GLOBALP 


: NIL 






COMBINATION 








ENV: 


*** 






ARGS: 


VARIABLE 








ENV: 


*** 






VAR: 


FNVAR-2 






GLOBALP 


: NIL 






COMBINATION 








ENV: 


*** 






ARGS: 


VARIABLE 








ENV: 


*** 






VAR: 


- 






GLOBALP 


: T 






VARIABLE 








ENV: 


*** 






VAR: 


VAR- 3 






GLOBALP 


: NIL 






CONSTANT 








ENV: 


*** 






VALUE : 


1 




COMBINATION 








ENV: 


*** 






ARGS: 


VARIABLE 








ENV: 


*** 






VAR: 


* 






GLOBALP 


: T 






VARIABLE 








ENV: 


*** 






VAR: 


VAR -3 






GLOBALP: 


: NIL 






VARIABLE 








ENV: 


*** 






VAR: 


VAR -4 






GLOBALP: 


NIL 



BODY: COMBINATION 

ENV: ((F FNVAR-2) (N VAR-1)) 

ARGS: VARIABLE 

ENV: ((F FNVAR-2) (N VAR-1)) 

VAR: FNVAR-2 

GLOBALP: NIL 
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VARIABLE 

ENV: ((F FNVAR-2) (N VAR-1)) 

VAR: VAR-1 

GLOBALP: NIL 
CONSTANT 

ENV: ((F FNVAR-2) (N VAR-1)) 

VALUE : 1 



The reader is asked to imagine that the expression 

((A VAR-4) (M VAR-3) (F FNVAR-2) (N VAR-1)) 

occurs where *** appears in the diagram. It should be clear how the ENV 
components are computed on the basis of variables bound at the LAMBDA and LABELS 
nodes. The ENV information propagates down the tree to VARIABLE nodes, where it 
is used to supply the correct new name for the one used by the original code. 

The first step in the preliminary analysis is the determination of 
referenced variables: 



LAMBDA 
REFS 
VARS 
BODY 



() 
(VAR-1) 

LABELS 

REFS: 
FN VARS: 
FNOEFS: 

LAMBDA 
REFS 
VARS 
BODY 



(VAR-1) 
(FNVAR-2) 



(FNVAR-2) 
(VAR-3 VAR-4) 
IF 

REFS: (FNVAR-2 VAR-3 VAR-4) 
PRED: COMBINATION 

REFS: (VAR-3) 
AR6S: VARIABLE 

REFS: () 
VAR: 

GLOBALP: T 
VARIABLE 

REFS: (VAR-3) 
VAR: VAR-3 
GLOBALP: NIL 
CONSTANT 

REFS: () 
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CON: 



ALT: 





VALUE : 





VARIABLE 






REFS: 


(VAR-4) 




VAR: 


VAR-4 




GLOBALP 


: NIL 




COMBINATION 






REFS: 


(FNVAR-2 VAR-3 VAR-4) 


ARGS: 


VARIABLE 






REFS: 


(FNVAR-2) 




VAR: 


FNVAR-2 




GLOBALP 


: NIL 




COMBINATION 






REFS: 


(VAR-3) 




ARGS: 


VARIABLE 

REFS: () 

VAR: 

GLOBALP: T 
VARIABLE 

REFS: (VAR-3) 

VAR: VAR-3 

GLOBALP: NIL 
CONSTANT 

REFS: () 

VALUE: 1 




COMBINATION 






REFS: 


(VAR-3 VAR-4) 




ARGS: 


VARIABLE 

REFS: () 
VAR: * 
GLOBALP: T 

VARIABLE 

REFS: (VAR-3) 
VAR: VAR-3 
GLOBALP: NIL 

VARIABLE 

REFS: (VAR-4) 
VAR: VAR-4 
GLOBALP: NIL 



BODY: 



COMBINATION 




REFS: 


(FNVAR-2 VAR-1) 


ARGS: 


VARIABLE 




REFS: (FNVAR-2 




VAR: FNVAR-2 




GLOBALP: NIL 




VARIABLE 




REFS: (VAR-1) 




VAR: VAR-1 




GLOBALP: NIL 




CONSTANT 




REFS: () 




VALUE : 1 
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The REFS component is a list of all local variables referenced at or below the 
node. Notice that, in general, the REFS component of a node is the union 
(considering them as sets) of the REFS components of its subnodes. In this way 
the information sifts up from the VARIABLE nodes. At a LAMBDA, LABELS, or CATCH, 
the variables bound at that node are filtered out of the REFS sifting up. The 
REFS for the outer function must always be (), a useful error check. In this 
example, we see that VAR-1 (N) is not referenced by the function FNVAR-2 (F). 
This indicates that a closure for this function need not contain the value for 
VAR-1 in its environment. (We will not actually use the information for this 
purpose, since later analysis will determine that the function need not have a 
closure constructed for it.) Another component ASETVARS is computed for each 
node, which contains the set of variables appearing in an ASET' at or below the 
node. We have omitted this information from the diagram since the value is the 
empty set in all cases. Certain properties are placed on the property list of 
each variable as well, which are not shown here. 
The next pass locates trivial subforms: 

LAMBDA 

TRIVP: NIL 
VARS: (VAR-1) 
BODY: 

LABELS 

TRIVP: NIL 
FNVARS: (FNVAR-2) 
FNDEFS: 

LAMBDA 

TRIVP: NIL 

VARS: (VAR-3 VAR-4) 

BODY: IF 

TRIVP: NIL 

PRED: COMBINATION 

TRIVP: T 

ARGS: VARIABLE 

TRIVP: T 
VAR: 

GLOBALP: T 
VARIABLE 
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TRIVP: 
VAR: 
GLOBALP 
CONSTANT 
TRIVP: 
VALUE : 



T 

VAR -3 
NIL 

T 




CON: VARIABLE 

TRIVP: T 
VAR: VAR -4 
GLOBALP: NIL 

ALT: COMBINATION 

TRIVP: NIL 
ARGS 



VARIABLE 
TRIVP: 
VAR: 
GLOBALP 

COMBINATION 
TRIVP: 
ARGS: 



COMBINATION 
TRIVP: 
ARGS: 



BODY: COMBINATION 

TRIVP: NIL 
ARGS: 



FNVAR-2 
: NIL 



VARIABLE 

TRIVP: T 
VAR: 
GLOBALP: T 

VARIABLE 

TRIVP: T 
VAR: VAR -3 
GLOBALP: NIL 

CONSTANT 

TRIVP: T 
VALUE: 1 



VARIABLE 

TRIVP: T 
VAR: * 
GLOBALP: T 

VARIABLE 

TRIVP: T 
VAR: VAR -3 
GLOBALP: NIL 

VARIABLE 

TRIVP: T 
VAR: VAR -4 
GLOBALP: NIL 



VARIABLE 

TRIVP: T 

VAR: FNVAR-2 

GLOBALP: NIL 

VARIABLE 

TRIVP: T 
VAR: VAR-1 
GLOBALP: NIL 
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CONSTANT 

TRIVP: T 
VALUE : 1 



Constants and variables are always trivial, and trivial combinations (involving 
only MacLISP primitives) are located. As before, in this pass information sifts 
up from below. One possibility not yet explored in RABBIT is to isolate entire 
SCHEME functions (for example FNVAR-2), determine that it is, as a whole, 
trivial, compile it as a simple MacLISP SUBR, and reference it as a primitive. 
This would in turn render trivial the combination (F N 1) in the body of the 
LABELS, for example. 

The analysis of side-effects merely determines that no side-effects are 
present, and is uninteresting for our example. The optimization pass finds no 
transformations worth making. We will skip over these steps to the conversion to 
continuation-passing style. As a simple S-expression, this may be rendered as: 

(LAMBDA (CONT-5 VAR-1) 

(LABELS ((FNVAR-2 

(LAMBDA (CONT-6 VAR-3 VAR-4) 
(IF (= VAR-3 0) 

(CONT-6 VAR-4) 
(FNVAR-2 CONT-6 

(- VAR-3 1) 
(* VAR-3 VAR-4)))))) 
(FNVAR-2 CONT-5 VAR-1 1))) 

In rendering this as a tree of data structures, we use structures of type CLAMBDA 
instead of LAMBDA, etc., in order to prevent confusion. Trivial forms are 
represented by structures of type TRIVIAL with pointers to the data structures 
from before. We will not notate such data structures in the following diagrams, 
but will simply write an S-expression as a reminder of what the trivial form was. 
The types RETURN and CONTINUATION are like COMBINATION and CLAMBDA, but are 
distinguished as discussed above for convenience and for purposes of consistency 



checking 
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CLAMBDA 
VARS: 
BODY: 



(CONT-5 VAR-1) 
CLABELS 

FNVARS: (FNVAR-2) 
FNDEFS: CLAMBDA 
VARS: 
BODY: 



(CONT-6 VAR-3 VAR-4) 
CIF 



PRED 



CON 



ALT: 



TRIVIAL 

(= VAR-3 0) 
RETURN 

CONT: CVARIABLE 

VAR: 
VAL: TRIVIAL 

VAR-4 
CCOMBINATION 

ARGS: TRIVIAL 



CONT-6 



BODY: CCOMBINATION 

ARGS: TRIVIAL 

FNVAR-2 
CVARIABLE 

VAR: 
TRIVIAL 

VAR-1 
TRIVIAL 
1 



FNVAR-2 
CVARIABLE 

VAR: CONT-6 
TRIVIAL 

(- VAR-3 1) 
TRIVIAL 

(* VAR-3 VAR-4) 



CONT-5 



The first post-conversion analysis pass computes ENV and REFS components 
as before, this time including the variables introduced to represent 
continuations. The ENV in this case is not an a-list, but simply a list of 
variables, since no renaming is taking place. The ENV information sifts down 
from above during the tree walk, and on the way back the REFS information sifts 
up. For a TRIVIAL node, the REFS information is taken from the pre-conversion 
node referenced by the TRIVIAL node; this REFS information is shown here as a 
reminder. As before, the REFS information for a node is always a subset of the 



ENV information. 
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CLAMBOA 












ENV: 


() 










REFS: 


() 










VARS: 


(CONT-5 VAR 


-1) 








BODY: 


CLABELS 












ENV: 


(CONT-5 VAR 


-1) 








REFS: 


(CONT-5 VAR 


-1) 








FNVARS: 


(FNVAR-2) 










FNDEFS: 


CLAMBDA 












ENV: 


(FNVAR-2 1 


:ONT-5 VAR-1) 








REFS: 


(FNVAR-2) 










VARS: 


(CONT-6 VAR-3 VAR-4) 








BODY: 


CIF 

ENV: 


*** 










REFS: 


(FNVAR-2 CONT-6 VAR-3 VAR-4) 








PREO: 


TRIVIAL 
REFS: 
(« VAR- 


(VAR-3) 
3 0) 








CON: 


RETURN 
ENV: 
REFS: 
CONT: 

VAL: 


*** 

(CONT-6 VAR-4) 

CVARIABLE 

ENV: *** 

REFS: (CONT-6) 

VAR: CONT-6 
TRIVIAL 

REFS: (VAR-4) 

VAR-4 








ALT: 


COMBINATION 










ENV: 


*** 










REFS: 


(FNVAR-2 CONT-6 VAR-3 VAR-4) 










ARGS: 


TRIVIAL 

REFS: (FNVAR-2) 

FNVAR-2 
CVARIABLE 

ENV: *** 

REFS: (CONT-6) 

VAR: CONT-6 
TRIVIAL 

REFS: (VAR-3) 

(- VAR-3 1) 
TRIVIAL 

REFS: (VAR-3 VAR-4) 

(* VAR-3 VAR-4) 



BODY: 



COMBINATION 
ENV: 
REFS 
ARGS 



(FNVAR-2 CONT-5 VAR-1) 
(FNVAR-2 CONT-5 VAR-1) 
TRIVIAL 

REFS: (FNVAR-2) 

FNVAR-2 
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CVARIABLE 

ENV: (FNVAR-2 CONT-5 VAR-1) 

REFS: (CONT-5) 

VAR: CONT-5 
TRIVIAL 

REFS: (VAR-1) 

VAR-1 
TRIVIAL 

REFS: () 

1 



The reader is asked to imagine that where *** occurs the expression 
(CONT-6 VAR-3 VAR-4 FNVAR-2 CONT-5 VAR-1) 

had been written instead. An additional operation performed on this pass is to 
flag all variables referenced in other than function position. These include 
VAR-1, VAR-3, etc.; but FNVAR-2 is not among them. This will be of importance 
below. 

The next pass determines all variables referenced by closures at or below 
each node, and also decides which functions will actually be closed. It is 
determined that FNVAR-2 need not be closed, because it is referred to only in 
function position (as determined by the previous pass), and is not referred to by 
any other closures. As a result, no closures are created at all in this 
function, and so all the computed sets of variables are empty. This pass also 
assigns the name F-7 to the outer function, for use later as a tag. 

The third pass computes the "depth" of each function, which determines 
through what registers or other locations arguments will be passed for each 
function. In this case the outer CLANBDA is assigned depth 0, and the one 
labelled FNVAR-2 is assigned depth 2, because it is not closed, and is contained 
in a depth function of 2 arguments. In this way registers are allocated in a 
purely stack-like manner; all closed functions are of depth 0, and all unclosed 
ones are at a depth determined by that of the containing function and its number 
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of arguments. 

One way to think about this trick is as follows. A closure consists of a 
pointer to a piece of code and a set of values determined at the time of closure. 
When the closure is invoked, we execute the code, making available to it (a) the 
set of values (its environment), and (b) some additional arguments. Slicing 
these components a different way, we may think of calling the bare code, 
supplying all the values as arguments; we pass the arguments in some registers, 
and the environment values in some other registers. Put yet another way, if we 
can determine that every caller of the closed function can reconstruct the 
necessary environment at the time of the call (because it will have available the 
necessary values anyway), then we can avoid constructing the closure at the point 
where the function should be closed, and instead arrange for each caller to pass 
the environment through specified registers. As mentioned earlier, the compiler 
has a completely free hand in determining the format of an environment! 

As it happens, the function labelled FNVAR-2 does not reference CONT-5 or 
VAR-1, and so this argument is of no importance here. It is determined that the 
following register assignments will apply: 



CONT-5 


**CONT** 


VAR-1 


**ONE** 


FNVAR-2 


<none> 


CONT-& 


**TW0** 


VAR-3 


**THREE** 


VAR-4 


**FOUR** 



{Note Continuation Variable Hack} We will see below that some unnecessary 
shuffling of values results; a more complicated register assignment technique 
would be useful here. (One was outlined in [Declarative], but it has not been 
implemented. See also [Wulf] and [Johnsson].) 

The fourth post-conversion analysis pass determines the format of 
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environments for closed functions. Since there are none in this example, this 
analysis is of little interest here. 

Finally, we are ready to generate code. Consider the S-expression form: 

(LAMBDA (CONT-5 VAR-1) 

(LABELS ((FNVAR-2 

(LAMBDA (CONT-6 VAR-3 VAR-4) 
(IF (= VAR-3 0) 
(CONT-6 VAR-4) 
(FNVAR-2 CONT-6 

(- VAR-3 1) 
(* VAR-3 VAR-4)))))) 
(FNVAR-2 CONT-5 VAR-1 1))) 

The first function encountered is the outer one (named F-7). In analyzing its 
body we note the LABELS, and place all the labelled functions (that is, FNVAR-2) 
on the queue of functions yet to be processed. We then analyze the body of the 
LABELS. This is a combination, and so we analyze each argument, producing code 
for each. Each argument must be TRIVIAL, a (C)VARIABLE, or a (C)LAMBDA- 
expression. (We shall refer to this set of possibilities as "meta-trivial", 
which means what "trivial" did in [Imperative].) The variable FNVAR-2 refers to 
a known function which is not closed, and so we need not set up **FUN**. The 
others may be referred to as **CONT**, **ONE**, and the constant 1, respectively. 
These are to be passed to FNVAR-2 through the registers **TWO**, **THREE**, and 
**FOUR** (as determined by the register allocation pass). Thus the code for F-7 
looks like this: 

F-7 ((LAMBDA (Q-40 Q-41 Q-42) 

(SETQ **FOUR** Q-42) 
(SETQ **THREE** Q-41) 
(SETQ **TWO** Q-40)) 
**CONT** **ONE** '1) 
(GO FNVAR-2) 

The first form sets up the arguments, using a standard "simultaneous assignment" 
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construction. The second branches to the code for FNVAR-2. Because a known 
function is being called, it is not necessary to set up **NARGS**. Because 
FNVAR-2 requires no closure, it is not necessary to set up **ENV**. 

The next function on the queue to process is FNVAR-2. Its body is an IF 
(actually a CIF); this is compiled into a COND containing the code for the 
predicate, consequent, and alternative: 

(COND (<predicate> <consequent>) 
(T <alternative>)) 

The predicate is guaranteed to be meta-trivial. It is, in this example, a 
trivial combination; this is compiled by changing all the variable references 
appropriately, producing (= **THREE** '0). 

The consequent involves calling an unknown continuation which is in 
**TWO**. The returned value is in **FOUR**. The code produced is: 

(SETQ **FUN** **TWO**) 
(SETQ **ONE** **FOUR**) 
(RETURN NIL) 

The (RETURN NIL) exits the module, passing control to the dispatcher in the 
SCHEME interpreter, which will arrange to invoke the continuation. 

The code for the alternative is similar to that for the body of F-7, 
because we are calling the known function FNVAR-2. The generated code is: 

((LANBDA (Q-43 Q-44) 

(SETQ **FOUR** Q-44) 
(SETQ **THREE** Q-43)) 
(- **THREE** '1) 
(* **THREE** **FOUR**)) 
(GO FNVAR-2) 

The argument set-up ought to involve copying **TWO** into **TWO**, but a peephole 
optimization eliminates that SETQ. 
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Putting all this together, the code for FNVAR-2 is: 

FNVAR-2 (COND ((= **THREE** '0) 

(SETQ **FUN** **TWO**) 
(SETQ **ONE** **FOUR**) 
(RETURN NIL)) 
(T ((LAMBDA (Q-43 Q-44) 

(SETQ **FOUR** Q-44) 
(SETQ **THREE** Q-43)) 
(- **THREE** «1) 
(* **THREE** **FOUR**)) 
(GO FNVAR-2))) 

(We have glossed over the peephole optimizations which eliminate occurrences of 
PROGN in such places as COND clauses.) 

There are no more functions to be processed, and so we now create the 
final module. The final output, with comments inserted by RABBIT for debugging 
purposes, and declarations supplied by RABBIT for the benefit of the MacLISP 
compiler, looks like this: 

(PROGN 'COMPILE 

(COMMENT MODULE FOR FUNCTION IFACT) 
(DEFUN ?-37 ( ) 
( PROG ( ) 

(DECLARE (SPECIAL ?-37)) 

(GO (PR0G2 NIL (CAR **ENV**) (SETQ **ENV** (CDR **ENV**)))) 
F-7 (COMMENT (DEPTH = 0) (FNP = NIL) (VARS = (C0NT-5 N))) 
((LAMBDA (0-40 Q-41 Q-42) 

(SETQ **FOUR** Q-42) 
(SETQ **THREE** Q-41) 
(SETQ **TWO** 0-40)) 
**CONT** **ONE** '1) 
(COMMENT (DEPTH = 2) (FNP = NOCLOSE) (VARS = (CONT-6 M A))) 
(GO FNVAR-2) 
FNVAR-2 

(COMMENT (DEPTH = 2) (FNP * NOCLOSE) (VARS « (CONT-6 MA))) 
(COND ((= **THREE** '0) 

(SETQ **FUN** **TWO**) 
(SETQ **ONE** **FOUR**) 
(RETURN NIL)) 
(T ((LAMBDA (Q-43 Q-44) 

(SETQ **FOUR** Q-44) 
(SETQ **THREE** Q-43)) 
(- **THREE** '1) 
(* **THREE** **FOUR**)) 
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(COMMENT (DEPTH « 2) (FNP « NOCLOSE) (VARS = (CONT-6 M A))) 

(GO FNVAR-2))))) 
(SETQ ?-37 (GET ' ?-37 'SUBR)) 
(SETQ IFACT (LIST 'CBETA ?-37 'F-7)) 
(OEFPROP ?-37 IFACT USER-FUNCTION)) 

In the interpolated comments, FNP refers to whether the function being entered or 
being called is closed or not (the possibilities are NIL, NOCLOSE, and EZCLOSE). 
The VARS are the passed variables, expressed as the names from the original 
source code, except for those introduced by the CPS conversion. The form (SETQ 
IFACT ...) constructs the closure for the globally defined function IFACT. The 
DEFPROP form provides debugging information. 

The points of interest in this example are the isolation of trivial 
subforms, and the analysis of the function FNVAR-2 which allows it to be called 
with GO. Examination of the output code will show that FNVAR-2 is coded as an 
iterative loop. While the register allocation leaves something to be desired, 
the inner loop does surprisingly little shuffling. (This should be compared with 
the code suggested in [Declarative] for this function.) 

For those who prefer "real" machine language, we give a plausible 
transcription of the MacLISP code into our hypothetical machine language: 

IFACT: PUSH CONT ;CONT contains the return address 

PUSH ONE 
PUSH 1 
POP FOUR 
POP THREE 
POP TWO 
GOTO FNVAR2 
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FNVARZ: JUMP-IF-ZERO THREE, FNV2A 

MOVE ONE, FOUR 

RETURN (TWO) ; return to address in TWO 

FNV2A: MOVE TEMP, THREE ;TEMP is used to evaluate 

ADD TEMP,1 ; trivial forms 

PUSH TEMP 

MOVE TEMP, THREE 

MUL TEMP, FOUR 

PUSH TEMP 

POP FOUR 

POP THREE 

GOTO FNVAR2 

While this is not the world's most impressively tight code, it again shows the 
essential iterative structure of the inner loop. The primary problem is the 
absence of analysis of which registers are used when. Leaving aside the question 
of allocating registers, one could at least determine when assigning values to 
registers for argument set-up can occur sequentially rather than simultaneously. 
There are a few other obvious optimizations which have not been 
performed, for example the elimination of (GO FNVAR-2) just before the tag FNVAR- 
2. While this would not have been difficult, we knew that the MacLISP compiler 
would take care of this for us; since it is not a very interesting issue, we let 
it slide. 
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10. Performance Measurements 

RABBIT has provision for metering runtime usage, and for controlling 
whether certain options in the optimizer are used. The standard test case has 
been RABBIT compiling itself (!); by running both interpreted and compiled 
version of this task, some comparisons have been made. Two different compiled 
versions have also been tested, where the code was produced with or without using 
the optimizer. 

The overall speed gain of unoptimized compiled code over interpreted code 
has been measured to be a factor of 25. The speed gain ratio excluding time for 
garbage collection was 17, and the garbage collection time ratio was 34. (The 
SCHEME interpreter does a lot of consing. The straight runtime ratio of 17 is 
roughly typical for standard LISP compilers on non-numeric code.) 

The overall speed ratio of optimized compiled code to unoptimized 
compiled code has been measured to be 1.2. The speed ratio excluding garbage 
collection was 1.37, and the garbage collection time ratio was 1.07. We conclude 
that the amount of consing was reduced very little, despite optimizations which 
may eliminate closures, because the phase-2 analysis of closures eliminated most 
consing from that source anyway. Eliminations of register shuffling because of 
substitutions of one variable for another were probably more significant. 

Combining these figures yields an overall speed ratio for optimized 
compiled code over interpreted code of about 30. 

Turning now to the analysis of compilation time, as opposed to running 
time, we have found that using the optimizer approximately doubles the cost of 
compilation. It might be possible to reduce this with a more clever optimizer; 
presently RABBIT wastes much time re-doing certain analysis unnecessarily. The 
extra time needed by the optimizer excluding garbage collection is only half 
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again the overall compilation time, but the garbage collection time triples, 
because the optimizer copies and re-copies parts of the program. 

There is also one error check which is very expensive; it checks every 
argument of a combination against every other argument to check for possible 
side-effect conflicts (this is the "liberal" analysis in EFFS-ANALYZE, and the 
testing done by CHECK-COMBINATION-PEFFS). Use of this error check increased 
compilation time by thirty percent. 
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11. Comparison with Other Work 

The only other work we know of similar to ours is that in 
[Wand and Friedman]. They use a technique from category theory known as 
factorization to isolate trivial expressions. As far as they go, their work is 
similar to ours; they have written a compiler for LISP code, producing output 
code which uses continuations. However, they indicate that they cannot interface 
compiled and interpreted code correctly. Moreover, while they use continuations, 
they do not make general use of closures, and in fact there is no clue that 
closures are permitted in their source language, or that functions are 
permissible as data objects. (In fact, there is evidence to the contrary in 
several examples they give involving an expression 

(MAPCAR (QUOTE (LAMBDA ...)) ...) 

These seem to indicate that they have not made the crucial distinction between 
treating a function as a data object and treating a representation of a function 
as data.) Wand and Friedman do realize the importance of tail-recursion, but 
fail to mention the necessity for lexical scoping (perhaps taking it for 
granted). We feel that the contributions of category theory may provide 
interesting new ways to analyze programs, but also feel that Wand and Friedman 
have not, in the work cited, explored it thoroughly, since they have not even 
explored the issue of closures as such. 

Somewhat more distantly related is the work of Carter and others at the 
IBM T.J. Watson Research Lab. [Carter] This work is similar in spirit, in that 
it uses "macro definitions" of complex operators, which are integrated into the 
program being compiled, followed by source-to-source program transformations 
which optimize the resulting code. However, they have primarily worked with 
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definitions of complex data manipulations, such as string concatenation, whereas 
this report has dealt exclusively with environment and control operations. 
(Also, as a matter of taste, we find SCHEME a simpler and more tractable language 
to deal with than the low-level dialect of PL/I used in [Carter], partly because 
of its closeness to lambda-calculus and partly because SCHEME inherits from LISP 
the natural ability to deal with representations of its own programs.) 
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12. Conclusions and Future Work 

Lexical scoping, tail-recursion, the conceptual treatment of functions 
(as opposed to representations thereof) as data objects, and the ability to 
notate "anonymous" functions make SCHEME an excellent language in which to 
express program transformations and optimizations. Imperative constructs are 
easily modelled by applicative definitions. Anonymous functions make it easy to 
avoid needless duplication of code and conflict of variable names. A language 
with these properties is useful not only at the preliminary optimization level, 
but for expressing the results of decisions about order of evaluation and storage 
of temporary quantities. These properties make SCHEME as good a candidate as any 
for an UNCOL. The proper treatment of functions and function calls leads to 
generation of excellent imperative low-level code. 

We have emphasized the ability to treat functions as data objects. We 
should point out that one might want to have a very simple run-time environment 
which did not support complex environment structures, or even stacks. Such an 
end environment does not preclude the use of the techniques described here. Many 
optimizations result in the elimination of LAMBDA-expressions; post CPS- 
conversion analysis eliminates the need to close many of the remaining LAMBDA- 
expressions. One could use the macros and internal representations of RABBIT to 
describe intermediate code transformations, and require that the final code not 
actually create any closures. As a concrete example, imagine writing an 
operating system in SCHEME, with machine words as the data domain (and functions 
excluded from the run-time data domain). We could still meaningfully write, for 
example: 
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(IF (OR (STOPPED (PROCESS I)) 

(AWAITING-INPUT (PROCESS I))) 
(SCHEDULE-LOOP (+11)) 
(SCHEDULE-PROCESS I)) 

While the intermediate expansion of this code would conceptually involve the use 
of functions as data objects, optimizations would reduce the final code to a form 
which did not require closures at run time. 

An experiment we would like to try would be to use CGOL [Pratt], a 
program which parses ALGOL-like syntax and produces LISP code, as a front end for 
RABBIT. The result would be a compiler for an ALGOL-like language which would 
produce code by the processes of parsing (by CGOL); macro-expansion, 
optimization, and output of NacLISP code (by RABBIT); and generation of PDP-10 
machine language (by the MacLISP compiler). 

Among the interesting issues we have not dealt with or have not yet 
implemented in RABBIT are: compilation of data manipulation primitives, 
interaction of such primitives, procedure integration of the most general form, 
and complex register allocation. A particularly interesting issue is that of 
data type analysis. Such analysis would solve certain problems which cannot 
easily be solved now by RABBIT. For example, consider the piece of code: 

(IF (OR A B) X Y) 

The macro-expansion and optimization phases will reduce this to: 

(IF A (IF A X Y) (IF B X Y)) 

The difficulty is that RABBIT has no way of knowing that A is known to be non- 
null in the first inner IF by virtue of the testing of A in the outer IF. If it 
could realize this, then the code would reduce to the more reasonable: 
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(IF A X (IF B X Y)) 

Compare this with the case of (IF (AND ...) ...) presented earlier. 

One particularly nagging difficulty concerns an interaction between CATCH 
and optimization by substituting expressions for variables. The problem is that 
if an expression with a side-effect is substituted into a place which is 
evaluated after the return of a call to an unknown function (where it had been 
written at a place normally evaluated before the call), and if a CATCH is 
performed within that unknown function, and the escape function is subsequently 
called more than once, then the expression with a side-effect will be evaluated 
twice instead of once. There is no possible way to decide whether this can 
happen, other than to be fearful of all unknown function calls. In practice this 
defeats most optimization. We have ignored this difficulty in RABBIT. It 
probably indicates that escape functions are even more intractable than we had 
earlier believed. It would not be so bad if we could insist that an escape 
function be called no more than once (or rather, that a CATCH be returned from no 
more than once, implying that if the escape function is used it must be 
dynamically within the body of the CATCH). If this restriction is enforced, or 
if CATCH is forbidden, then in fact no continuation can be invoked more than 
once, which, with other suitable restrictions, accounts for the ability of most 
languages to use stacks instead of trees for their control stacks. 
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Notes 



{Note ASET' Is Imperative} 

It is true that ASET' is an actual imperative which produces a side 
effect, and is not expressed applicatively. ASET' is used only for two purposes 
in practice: to initialize global variables (often relating to MacLISP 
primitives), and to implement objects with state (cells, in the PLASMA sense 
[Smith and Hewitt] [Hewitt and Smith]). If we were to redesign SCHEME from 
scratch, I imagine that we would introduce cells as our primitive side-effect 
rather than ASET'. The decision to use ASET' was motivated primarily by the 
desire to interface easily to the MacLISP environment (and, as a corollary, to be 
able to implement SCHEME in three days instead of three years!). 

We note that in approximately one hundred pages of SCHEME code written by 
three people, the non-quoted ASET has never been used, and ASET' has been used 
only a dozen times or so, always for one of the two purposes mentioned above. In 
most situations where one would like to write an assignment of some kind, macros 
which expand into applicative constructions suffice. 



94 

(Note Code Pointers} 

Conceptually a closure is made up of a pointer to some code (a "script" 
[Smith and Hewitt]) and an environment. In a RABBIT-formatted CBETA, the pointer 
to the code is encoded into two levels: a pointer to a particular piece of 
MacLISP code, plus a tag within that PROG. This implementation was forced upon 
us by MacLISP. If we could easily create pointers into the middle of a PROG, we 
could avoid this two-level encoding. 

On the other hand, this is not just an engineering kludge, but can be 
provided with a reasonable semantic explanation: rather than compiling a lot of 
little functions, we compile a single big function which is a giant CASE 
statement. Wherever we wish to make a closure of a little function, we actually 
close a different little function which calls the big function with an extra 
argument to dispatch on. 



{Note Continuation Variable Hack} 

Since the dissertation was written, a simple modification to the routine 
which converts to continuation-passing style has eliminated some of the register 
shuffling. The effect of the change was to perform substitutions of one 
continuation variable for another, in situations such as: 

((CLAMBDA (CONT-3 ...) ...) 
CONT-Z ...) 

where CONIT-2 would be substituted for CONT-3 in the body of the CLANBDA- 
expression. Once this is done, CONT-3 is unreferenced, and so is not really 
passed at all by virtue of the phase-2 analysis. The result is that 
continuations are not copied back and forth from register to register. In the 
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iterative factorial example in the text, the actual register assignment would be 



CONT-5 


**CONT** 


VAR-1 


**ONE** 


FNVAR-2 


<none> 


VAR-3 


**TWO** 


VAR-4 


**THREE** 



This optimization is discussed more thoroughly in the Appendix near the routine 
CONVERT-COMBINATION. 



{Note Dijkstra's Opinion} 

In [Dijkstra] a remark is made to the effect that defining the while -do 
construct in terms of function calls seems unusually clumsy. In [Steele] we 
reply that this is due partly to Dijkstra's choice of ALGOL for expressing the 
definition. Here we would add that, while such a definition is completely 
workable and is useful for compilation purposes, we need never tell the user that 
we defined while -do in this manner! Only the writer of the macros needs to know 
the complexity involved; the User need not, and should not, care as long as the 
construction works when he uses it. 
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{Note Evaluation for Control} 

It is usual in a compiler to distinguish at least three "evaluation 
contexts": value, control, and effect. (See [Wulf], for example.) Evaluation 
for control occurs in the predicate of an IF, where the point is not so much to 
produce a data object as simply to decide whether it is true or false. The 
results of AND, OR, and NOT operations in predicates are "encoded in the program 
counter". When compiling an AND, OR, or NOT, a flag is passed down indicating 
whether it is for value or for control; in the latter case, two tags are also 
passed down, indicating the branch targets for success or failure. (This is 
called "anchor pointing" in [Allen and Cocke].) 

In RABBIT this notion falls out automatically without any special 
handling, thanks to the definition of AND and OR as macros expanding into IF 
statements. If we were also to define NOT as a macro 

(NOT x) => (IF x 'NIL 'T) 

then nearly all such special "evaluation for control" cases would be handled by 
virtue of the nested-IF transformation in the optimizer. 

One transformation which ought to be in the optimizer is 

(IF ((LAMBDA (X Y ...) <body>) A B ...) <con> <alt>) 

= > ((LAMBDA (X Y ...) (IF <body> <con> <alt») A B ...) 

which could be important if the <body> is itself as IF. (This transformation 
would occur at a point (in the optimizer) where no conflicts between X, Y, ... 
and variables used in <con> and <alt> could occur.) 
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{Note Evaluation for Effect} 

This is the point where the notion of evaluation for effect is handled 
(see {Note Evaluation for Control}). It is detected as the special case of 
evaluation for value where no one refers to the value! This may be construed as 
the distinction between "statement" and "expression" made in Algol-like 
languages. 



{Note Full-Funarg Example} 

As an example of the difference between lexical and dynamic scoping, 
consider the classic case of the "funarg problem". We have defined a function 
MAPCAR which, given a function and a list, produces a new list of the results of 
the function applied to each element of the given list: 

(DEFINE MAPCAR 

(LAMBDA (FN L) 

(IF (NULL L) NIL 

(CONS (FN (CAR L)) (MAPCAR FN (CDR L)))))) 

Now suppose in another program we have a list X and a number L, and want to add L 
to every element of X: 

(MAPCAR (LAMBDA (Z) (+ Z L)) X) 

This works correctly in a lexically scoped language such as SCHEME, because the L 
in the function (LAMBDA (Z) (+ Z L)) refers to the value of L at the point the 
LAMBDA-expression is evaluated. In a dynamically scoped language, such as 
standard LISP, the L refers to the most recent run-time binding of L, which is 
the binding in the definition of MAPCAR (which occurs between the time the 
LAMBDA-expression is passed to MAPCAR and the time the LAMBDA-expression is 
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invoked) . 



{Note Generalized LABELS} 

Since the dissertation was written, and indeed after [Revised Report] 
came out, the format of LABELS in SCHEME was generalized to permit labelled 
functions to be defined using any of the same three formats permitted by DEFINE 
in [Revised Report]. RABBIT has been updated to reflect this change, and the 
code for it appears in the Appendix. 



{Note Heap-Allocated Contours} 

RABBIT maintains heap-allocated environments as a simple chained list of 
variable values. However, all the variables which are added on at once as a 
single set may be regarded as a new "contour" in the Algol sense. Such contours 
could be heap-allocated arrays (vectors), and so an environment would be a 
chained list of such little arrays. The typical Algol implementation technique 
using a "display" (a margin array whose elements point at successive elements 
(contours) of the environment chain) is clearly applicable here. One advantage 
of the list-of-all-values representation actually used in RABBIT is that null 
contours automatically add no content to the environment structure, which makes 
it easier to recognize later, in the code generator, that no environment 
adjustments are necessary in changing between two environments which differ only 
by null contours (see the code for ADJUST-KNOWNFN-CENV in the Appendix). 
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{Note Loop Unrolling} 

In the case of a LABELS used to implement a loop, the substitution of a 
labelled function for the variable which names it would constitute an instance of 
loop unrolling [Allen and Cocke], particularly if the substitution permitted 
subsequent optimizations such as eliminating dead code. Here, as elsewhere, a 
specific optimization technique falls out as a consequence of the more general 
technique of beta-conversion. 



{Note Multiple-Argument Continuations} 

One could easily define a SCHEME-like language in which continuations 
could take more than one argument (that is, functions could return several 
values); see the discussion in [Declarative]. We have elected not to provide 
for this in SCHEME and RABBIT. 



{Note Non-deterministic CPS Conversion} 

As with optimization, so the conversion to continuation-passing style 
involves decisions which ideally could be made non-deterministically. The 
decisions made at this level will affect later decisions involving register 
allocation, etc., which cannot easily be foreseen at this stage. 
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{Note Non-deterministic Optimization} 

To simplify the implementation, RABBIT uses only a deterministic (and 
very conservative) optimizer. Ideally, an optimizer would be non-deterministic 
in structure; it could try an optimization, see how the result interacted with 
other optimizations, and back out if the end result is not as good as desired. 
We have experimented briefly with the use of the AMORD language [Doyle] to build 
a non-deterministic compiler, but have no significant results yet. 

We can see more clearly the fundamental unity of macros and other 
optimizations in the light of this hypothetical non-deterministic implementation. 
Rather than trying to guess ahead of time whether a macro expansion or 
optimization is desirable, it goes ahead and tries, and then measures the utility 
of the result. The only difference between a macro and other optimizations is 
that a macro call is an all-or-nothing situation: if it cannot be expanded for 
some reason, it is of infinite disutility, while if it can its disutility is 
finite. This leads to the idea of non-deterministic macro expansions, which we 
have not pursued. 



{Note Non-quoted A5ET} 

The SCHEME interpreter permits one to compute the name of the variable, 
but for technical and philosophical reasons RABBIT forbids this. We shall treat 
M ASET ,M as a single syntactic object (think "ASETQ"). 

Hewitt (private communication) and others have objected that the ASET 
primitive is "dangerous" in that one cannot predict what variable may be 
clobbered, and in that it makes one dependent on the representation of variables 
(since one can "compute up" an arbitrary variable to be set). The first is a 
valid objection on the basis of programming style or programming philosophy. 
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(Indeed, on this basis alone it was later decided to remove ASET from the SCHEME 
language, leaving only ASET' in [Revised Report].) The second is only slightly 
true; the compiler can treat ASET with an non-quoted first argument as a sort of 

macro. Let VI, VZ VN be the names of the bound variables accessible to the 

occurrence of ASET in question. These names are all distinct, for if two are the 
same, one variable "shadows" another, and so we may omit the one shadowed (and so 
inaccessible). Then we may write the transformation: 

(ASET a b) => ((LAMBDA (Ql Q2) 

(COND ((EQ Ql 'VI ) (ASET' VI Q2)) 
((EQ Ql 'V2) (ASET' V2 Q2)) 

((EQ Ql «VN) (ASET 1 VN Q2)) 
(T (GLOBAL-SET P Ql Q2)))) 
a 

b) 

This transformation is to be made after the alpha-conversion process, which 
renames all variables; Ql and Q2 are two more generated variables guaranteed not 

to conflict with VI VN. This expansion makes quite explicit the fact that 

we are comparing against a list of symbols to decide which variable to modify. 
The actual run-time representation of variables is not exploited, the one 
exception being the GLOBAL-SET operator, which raises questions about the meaning 
of the global environment and the user interface which we are not prepared to 
answer. 

(See also {Note ASET' Is Imperative}.) 
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{Note Old CPS Algorithm) 

We reproduce here Appendix A of [Declarative]: 

Here we present a set of functions, written in SCHEME, which convert a 
SCHEME expression from functional style to pure continuation-passing style. 
{Note PLASMA CPS} 

(ASET' GENTEMPNUM 0) 

(DEFINE GENTEMP 

(LAMBDA (X) 

(IMPLODE (CONS X (EXPLODEN (ASET* GENTEMPNUM (+ 6ENTEMPNUM 1))))))) 

GENTEMP creates a new unique symbol consisting of a given prefix and a unique 
number. 

(DEFINE CPS (LAMBDA (SEXPR) (SPRINTER (CPC SEXPR NIL '#CONT#)))) 

CPS (Continuation-Passing Style) is the main function; its argument is the 
expression to be converted. It calls CPC (C-P Conversion) to do the real work, 
and then calls SPRINTER to pretty-print the result, for convenience. The symbol 
#CONT# is used to represent the implied continuation which is to receive the 
value of the expression. 



103 



(DEFINE CPC 

(LAMBDA (SEXPR ENV CONT) 

(COND ((ATOM SEXPR) (CPC-ATOM SEXPR ENV CONT)) 
((EQ (CAR SEXPR) 'QUOTE) 

(IF CONT M (,CONT .SEXPR) SEXPR)) 
((EQ (CAR SEXPR) 'LAMBDA) 

(CPC-LAMBDA SEXPR ENV CONT)) 
((EQ (CAR SEXPR) 'IF) 

(CPC- IF SEXPR ENV CONT)) 
((EQ (CAR SEXPR) 'CATCH) 

(CPC-CATCH SEXPR ENV CONT)) 
((EQ (CAR SEXPR) 'LABELS) 

(CPC-LABELS SEXPR ENV CONT)) 
((AND (ATOM (CAR SEXPR)) 

(GET (CAR SEXPR) 'AMACRO)) 
(CPC (FUNCALL (GET (CAR SEXPR) 'AMACRO) SEXPR) ENV CONT)) 
(T (CPC-FORM SEXPR ENV CONT))))) 

CPC merely dispatches to one of a number of subsidiary routines based on the form 
of the expression SEXPR. ENV represents the environment in which SEXPR will be 
evaluated; it is a list of the variable names. When CPS initially calls CPC, 
ENV is NIL. CONT is the continuation which will receive the value of SEXPR. The 
double-quote (") is like a single-quote, except that within the quoted expression 
any subexpressions preceded by comma (,) are evaluated and substituted in (also, 
any subexpressions preceded by atsign (@) are substituted in a list segments). 
One special case handled directly by CPC is a quoted expression; CPC also 
expands any SCHEME macros encountered. 

(DEFINE CPC-ATOM 

(LAMBDA (SEXPR ENV CONT) 

((LAMBDA (AT) (IF CONT "(.CONT .AT) AT)) 
(COND ((NUMBERP SEXPR) SEXPR) 
((MEMQ SEXPR ENV) SEXPR) 
((GET SEXPR 'CPS-NAME)) 
(T (IMPLODE (CONS 'X (EXPLODEN SEXPR)))))))) 

For convenience, CPC-ATOM will change the name of a global atom. Numbers and 
atoms in the environment are not changed; otherwise, a specified name on the 
property list of the given atom is used (properties defined below convert "♦■ 



104 

into "++", etc.); otherwise, the name is prefixed with "X". Once the name has 
been converted, it is converted to a form which invokes the continuation on the 
atom. (If a null continuation is supplied, the atom itself is returned.) 

(DEFINE CPC-LAMBDA 

(LAMBDA (SEXPR ENV CONT) 
((LAMBDA (CN) 

((LAMBDA (LX) (IF CONT "( ,CONT ,LX) LX)) 
"(LAMBDA (g(CADR SEXPR) ,CN) 
,(CPC (CADDR SEXPR) 

(APPEND (CADR SEXPR) (CONS CN ENV)) 
CN)))) 
(GENTEMP 'C)))) 

A LAMBDA expression must have an additional parameter, the continuation supplied 
to its body, added to its parameter list. CN holds the name of this generated 
parameter. A new LAMBDA expression is created, with CN added, and with. its body 
converted in an environment containing the new variables. Then the same test for 
a null CONT is made as in CPC-ATOM. 

(DEFINE CPC-IF 

(LAMBDA (SEXPR ENV CONT) 
((LAMBDA (KN) 

"((LAMBDA (,KN) 

,(CPC (CADR SEXPR) 
ENV 
((LAMBDA (PN) 

"(LAMBDA (,PN) 

(IF ,PN 

.(CPC (CADDR SEXPR) 
ENV 
KN) 
.(CPC (CADDDR SEXPR) 
ENV 
KN)))) 



(GENTEMP '?)))) 



.CONT)) 
(GENTEMP 'K)))) 



First, the continuation for an IF must be given a name KN (rather, the name held 
in KN; but for convenience, we will continue to use this ambiguity, for the form 
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of the name is indeed Kn for some number n), for it will be referred to in two 
places and we wish to avoid duplicating the code. Then, the predicate is 
converted to continuation-passing style, using a continuation which will receive 
the result and call it PN. This continuation will then use an IF to decide which 
converted consequent to invoke. Each consequent is converted using continuation 
KN. 

(DEFINE CPC-CATCH 

(LAMBDA (SEXPR ENV CONT) 
((LAMBDA (EN) 

"((LAMBDA (,EN) 

((LAMBDA (,(CADR SEXPR)) 

,(CPC (CADDR SEXPR) 

(CONS (CADR SEXPR) ENV) 
EN)) 
(LAMBDA (V C) ( ,EN V)))) 
,CONT)) 
(GENTEMP 'E)))) 

This routine handles CATCH as defined in [Sussman 75], and in converting it to 
continuation-passing style eliminates all occurrences of CATCH. The idea is to 
give the continuation a name EN, and to bind the CATCH variable to a continuation 
(LAMBDA (V C) ...) which ignores its continuation and instead exits the catch by 
calling EN with its argument V. The body of the CATCH is converted using 
continuation EN. 

(DEFINE CPC-LABELS 

(LAMBDA (SEXPR ENV CONT) 

(DO ((X (CADR SEXPR) (CDR X)) 

(Y ENV (CONS (CAAR X) Y))) 
((NULL X) 
(DO ((W (CADR SEXPR) (CDR W)) 

(Z NIL (CONS (LIST (CAAR V) 

(CPC (CADAR W) Y NIL)) 
Z))) 
((NULL W) 
"(LABELS .(REVERSE Z) 

.(CPC (CADDR SEXPR) Y CONT)))))))) 
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Here we have used DO loops as defined in MacLISP (DO is implemented as a macro in 
SCHEME). There are two passes, one performed by each DO. The first pass merely 
collects in Y the names of all the labelled LAMBDA expressions. The second pass 
converts all the LAMBDA expressions using a null continuation and an environment 
augmented by all the collected names in Y, collecting them in Z. At the end, a 
new LABELS is constructed using the results in Z and a converted LABELS body. 

(DEFINE CPC-FORM 

(LAMBDA (SEXPR ENV CONT) 
(LABELS ((LOOP1 

(LAMBDA (X Y Z) 

(IF (NULL X) 

(DO ((F (REVERSE (CONS CONT Y)) 
(IF (NULL (CAR Z)) F 
(CPC (CAR Z) 
ENV 

"(LAMBDA (.(CAR Y)) ,F)))) 
(Y Y (CDR Y)) 
(Z Z (CDR Z))) 
((NULL Z) F)) 
(COND ((OR (NULL (CAR X)) 
(ATOM (CAR X))) 
(L00P1 (CDR X) 

(CONS (CPC (CAR X) ENV NIL) Y) 
(CONS NIL Z))) 
((EQ (CAAR X) 'QUOTE) 
(L00P1 (CDR X) 

(CONS (CAR X) Y) 
(CONS NIL Z))) 
((EQ (CAAR X) 'LAMBDA) 
(L00P1 (CDR X) 

(CONS (CPC (CAR X) ENV NIL) Y) 
(CONS NIL Z))) 
(T (LOOP1 (CDR X) 

(CONS (GENTEMP 'T) Y) 
(CONS (CAR X) Z)))))))) 
(LOOP1 SEXPR NIL NIL)))) 

This, the most complicated routine, converts forms (function calls). This also 
operates in two passes. The first pass, using LOOP1, uses X to step down the 
expression, collecting data in Y and Z. At each step, if the next element of X 
can be evaluated trivially, then it is converted with a null continuation and 
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added to Y, and NIL is added to Z. Otherwise, a temporary name TN for the result 
of the subexpression is created and put in Y, and the subexpression itself is put 
in Z. On the second pass (the DO loop), the final continuation-passing form is 
constructed in F from the inside out. At each step, if the element of Z is non- 
null, a new continuation must be created. (There is actually a bug in CPC-FORM, 
which has to do with variables affected by side-effects. This is easily fixed by 
changing LOOP1 so that it generates temporaries for variables even though 
variables evaluate trivially. This would only obscure the examples presented 
below, however, and so this was omitted.) 

(LABELS ((BAR 

(LAMBDA (DUMMY X Y) 

(IF (NULL X) • |CPS ready to go!| 

(BAR (PUTPROP (CAR X) (CAR Y) 'CPS-NAME) 
(CDR X) 
(CDR Y)))))) 
(BAR NIL 

'(+-*// ~ T NIL) 
'■(++ -- ** //// M 'T 'NIL))) 

This loop sets up some properties so that "+■ will translate into "++" instead of 
"%+", etc. 

Now let us examine some examples of the action of CPS. First, let us try 
our old friend FACT, the iterative factorial program. 

(DEFINE FACT 

(LAMBDA (N) 

(LABELS ((FACT1 (LAMBDA (M A) 

(IF (« M 0) A 

(FACT1 (- M 1) (* M A)))))) 
(FACT1 N 1)))) 

Applying CPS to the LAMBDA expression for FACT yields: 
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(#C0NT# 

(LAMBDA (N C7) 

(LABELS ((FACT1 

(LAMBDA (M A CIO) 
((LAMBDA (Kll) 

(%= M 

(LAMBDA (P12) 

(IF P12 (Kll A) 
(-- M 1 

(LAMBDA (T13) 
(** M A 

(LAMBDA (T14) 

(FACT1 T13 T14 Kll))))))))) 
CIO)))) 
(FACT1 N 1 C7)))) 

As an example of CATCH elimination, here is a routine which is a 
paraphrase of the 8QRT routine from [Sussman 75]: 

(DEFINE SORT 

(LAMBDA (X EPS) 

((LAMBDA (ANS L00PTA6) 

(CATCH RETURNTAG 

(BLOCK (ASET' LOOPTAG (CATCH M M)) 
(IF — 

(RETURNTAG ANS) 
NIL) 
(ASET' ANS — ) 
(LOOPTAG LOOPTAG)))) 
1.0 
NIL))) 

Here we have used "---" and "===" as ellipses for complicated (and relatively 
uninteresting) arithmetic expressions. Applying CPS to the LAMBDA expression for 
SQRT yields: 
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(#CONT# 
(LAMBDA (X EPS C33) 

((LAMBDA (ANS LOOPTAG C34) 
((LAMBDA (E35) 

((LAMBDA (RETURNTAG) 
((LAMBDA (E52) 

((LAMBDA (M) (E52 M)) 
(LAMBDA (V C) (E52 V)))) 
(LAMBDA (T51) 

(%ASET* LOOPTAG T51 
(LAMBDA (T37) 

((LAMBDA (A B C36) (B C36)) 
T37 
(LAMBDA (C40) 

((LAMBDA (K47) 

((LAMBDA (P50) 
(IF P50 

(RETURNTAG ANS K47) 
(K47 'NIL))) 

(LAMBDA (T42) 

((LAMBDA (A B C41) (B C41)) 
T42 
(LAMBDA (C43) 

(XASET' ANS %« = 

(LAMBDA (T45) 

((LAMBDA (A B C44) 
(B C44)) 
T45 

(LAMBDA (C46) 
(LOOPTAG 
LOOPTAG 
C46)) 
C43)))) 
C40)))) 
E35)))))) 
(LAMBDA (V C) (E35 V)))) 
C34)) 
1.0 
•NIL 
C33))) 

Note that the CATCHes have both been eliminated. It is left as an exercise for 
the reader to verify that the continuation-passing version correctly reflects the 
semantics of the original. 
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{Note Operations on Functions} 

It would certainly be possible to define other operations on functions, 
such as determining the number of arguments required, or the types of the 
arguments and returned value, etc. (Indeed, after the dissertation was written, 
it was decided to include such an operator PROCP in [Revised Report].) The point 
is that functions need not conform to a specific representation such as S- 
expressions. At a low level, it may be useful to think of invocation as a 
generic operator which dispatches on the particular representation and invokes 
the function in an appropriate manner. Similarly, a debugging package might need 
to be able to distinguish the various representations. At the user level, 
however, it is perhaps best to hide this issue, and answer a type inquiry with 
merely "function". 



{Note Refinement of RABBIT) 

Since the original dissertation was written I have continued to refine 
and improve RABBIT. This effort has included a complete rewriting of the 
optimizer to make it more efficienct and at the same time more lucid. It also 
included accommodation of changes to SCHEME as documented in [Revised Report]. 
This work has spanned perhaps eight months' time, because the availability of 
computer time restricted me to testing RABBIT only once or twice a night. Thus, 
the actual time expended for the improvements was much less than ten hours a 
week. 
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{Note Side-Effect Classifications} 

The division of side-effects into classes in RABBIT was not really 
necessary to the primary goals of RABBIT, but was undertaken as an interesting 
experiment for our own edification. One could easily imagine a more complex 
taxonomy. A case of particular interest not handled by RABBIT is dividing the 
ASET side-effect into ASET of each particular variable; thus an ASET on FOO 
would not affect a reference to the variable BAR. This could have been done in 
an ad hoc manner, but we are interested in a more general method dealing only 
with sets of effects and affectabilities. 



{Note Subroutinization} 

We have not said anything about how to locate candidate expressions for 
subroutinization. For examples of appropriate strategies, see [Geschke] and 
[Aho, Johnson, and Ullman]. Our point here is that SCHEME, thanks to the 
property of lexical scoping and the ability to write "anonymous" functions as 
LAMBDA-expressions, provides an ideal way to represent the result of such 
transformations . 
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{Note Tail-Recursive OR} 

Since the dissertation was written, the SCHEME language was redefined in 
[Revised Report] to prescribe a "tail-recursive" interpretation for the last form 
in an AND or OR. This requirement necessitated a redefinition of OR which is in 
fact dual to the definition of AND. 
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cruirMc / JL reS ? nt here the com P lete working source code for RABBIT, written in 
SCHEME. (The listing of the code was produced by the M @" listing generator, 
written by Richard M. Stallman, Guy L. Steele Jr., and other contributors.) 

The code is presented on successive odd-numbered pages. Commentary on 
the code is on the facing even-numbered page. An index appears at the end of the 
listing, indicating where each function is defined. 

It should be emphasized that RABBIT was not written with efficiency as a 
particular goal. Rather, the uppermost goals were clarity, ease of debugging 
and adaptability to changing algorithms during the development process Much 
information is generated, never used by the compilation process, and then thrown 
away, simply so that if some malfunction should occur it would be easier to 
conduct a post-mortem analysis. Information that is used for compilation is 
often retained longer than necessary. The overall approach is to create a big 
data structure and then, step by step, fill in slots, never throwing anything 
away, even though it may no longer be needed. 

u- u «I he algoritnms could b e increased in speed, particularly the optimizer, 
which often recomputes information needlessly. Determining whether or not the 
recomputation was necessary would have cluttered up the algorithms, however, 
making them harder to read and to modify, and so this was omitted. Similarly 
certain improvements could dramatically decrease the space used. The larger 
functions in RABBIT can just barely be compiled with a memory size of 256K words 

«!! a i m 1°' HoweVer ' U was deemed wor thwhile to keep the extra information 
available for as long a time as possible. 

The implementation of RABBIT has taken perhaps three man-months. This 
includes throwing away the original optimizer and rewriting it completely, and 
accomodating certain changes to the SCHEME language as they occurred. RABBIT was 
operational, without the optimizer, after about one man-month's work. The 
dissertation was written after the first version of the optimizer was 
demonstrated to work. The remaining time was spent analyzing the faults of the 
first optimizer, writing the second version, accomodating language changes, 
making performance measurements, and testing RABBIT on programs other than RABBIT 
use 1 1 . 
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The main modules of RABBIT are organized something like this: 



COMFILE, TRANSDUCE, PROCESS-FORM 
COMPILE 

ALPHATIZE 

MACRO-EXPAND 
META- EVALUATE 

PASS 1 -ANALYZE 
ENV- ANALYZE 
TRIV-ANALYZE 
EFFS-ANALYZE 
META-IF-FUDGE 
META-COMBINATION-TRIVFN 
META-COMBINATION-LAMBDA 
SUBST-CANDIDATE 
META-SUBSTITUTE 
CONVERT 
CENV-ANALYZE 
BIND-ANALYZE 
DEPTH-ANALYZE 
CLOSE-ANALYZE 
COMPILATE-ONE-FUNCTION 
COMPILATE 

COMP-BODY 

ANALYZE 

TRIV-ANALYZE 



(Bookkeeping and file handling) 

(Compile a function definition) 

(Convert input, rename variables) 

(Expand macro forms) 
(Source-to-source optimizations) 
(Preliminary code analysis) 
(Environment analysis) 
(Triviality analysis) 
(Side effects analysis) 
(Transform nested IF expressions) 
(Constants folding) 
( Beta-conversion ) 

(Substitution feasibility) 
(Substitution, subsumption) 
(Convert to continuation-passing style) 
(Environment analysis) 
(Bindings analysis) 
(Register allocation) 
(Environment structure design) 
(Generate code, producing one module) 
(Generate code for one subroutine) 
(Compile procedure body) 
(Generate value-producing code) 
(Generate "trivial" code) 
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The DECLARE forms are for the benefit of the MacLISP compiler, which will 
process the result of compiling this file (i.e. RABBIT compiling itself). The 
first few forms are concerned with switch settings, allocation of memory within 
the MacLISP compiler, and loading of auxiliary functions which must be available 
at compile time. 

The large block of SPECIAL declarations contains the name of every SCHEME 
function in the file. This is necessary because the run-time representation of a 
global variable is as a MacLISP SPECIAL variable. The compiled function objects 
will reside in MacLISP value cells, and SCHEME functions refer to each other 
through these cells. 

The second set of SPECIAL declarations (variables whose names begin and 
end with a "*") specify variables used globally by RABBIT. These fall into three 
categories: variables containing properties of the SCHEME interpreter which are 
parameters for the compiler (e.g. **ARGUMENT-REGISTERS**) ; switches, primarily 
for debugging purposes, used to control certain compiler operations (e.g. 
•FUDGE*); and own variables for certain functions, used to generate objects or 
gather statistics (e.g. *GENTEMPNUM* and *DEPROGNIFY-COUNT*). 

The PROCLAIM forms are to RABBIT as DECLARE forms are to the MacLISP 
compiler. These provide declarations to the incarnation of RABBIT which is 
compiling the file. The subforms of a PROCLAIM form are executed by RABBIT when 
it encounters the form in a file being compiled. (We will see later how this is 
done. ) 



001 ;;; RABBIT COMPILER -*-LISP-*- RABBIT 568 05/18/78 Pagel 

00 2 ~~ _ -"• 

003 (DECLARE (FASLOAD (QUUX) SCHMAC)) 

004 (DECLARE (MACROS T> (NEWIO T)) 

005 (DECLARE (ALLOC '(LIST (300000 450000 .2) FIXNUM 50000 SYMBOL 24000))) 

006 (DECLARE (DEFUN DISPLACE (X Y) Y)) 
007 

008 (DECLARE (SPECIAL EMPTY TRIVFN GENTEMP GENFLUSH GEN-GLOBAL-NAME PRINT-WARNING ADDPROP DELPROP SETPROP 

009 ADJOIN UNION INTERSECT REMOVE SET0IFF PAIRLIS COMPILE PASS1-ANALYZE TEST-COMPILE 

010 NODIFY ALPHATIZE ALPHA-ATOM ALPHA-LAMBDA ALPHA- IF ALPHA-ASET ALPHA-CATCH 

011 ALPHA-LABELS ALPHA-LABELS-DEFN ALPHA-BLOCK MACRO-EXPAND ALPHA-COMBINATION 

012 ENV-ANALYZE TRIV-ANALYZE TRIV-ANALYZE-FN-P EFFS-ANALYZE EFFS-UNION EFFS-ANALYZE- IF 

013 EFFS-ANALYZE-COMBINATION CHECK-COMBINATION-PEFFS ERASE-NODES META-EVALUATE 

014 META-IF-FUDGE META-COMBINATION-TRIVFN META-COMBINATION-LAMBDA SUBST-CANDIDATE 

015 REANALYZE1 EFFS- INTERSECT EFFECTLESS EFFECTLESS-EXCEPT-CONS PASSABLE 

016 META-SUBSTITUTE COPY-CODE COPY-NODES CNODIFY CONVERT MAKE-RETURN CONVERT-LAMBDA-FM 

017 CONVERT- IF CONVERT-ASET CONVERT-CATCH CONVERT-LABELS CONVERT-COMBINATION 

018 CENV-ANALYZE CENV-TRIV-ANALYZE CENV-CCOMBINATION-ANALYZE BIND-ANALYZE REFD-VARS 

019 BIND-ANALYZE-CLAMBDA BIND-ANALYZE-CONTINUATION BIND-ANALYZE-CIF BIND-ANALYZE-CASET 

020 BIND-ANALYZE-CLABELS BIND-ANALYZE-RETURN BIND-ANALYZE-CCOMBINATION 

021 BIND-CCOMBINATION-ANALYZE DEPTH-ANALYZE F ILTER-CLOSEREFS CLOSE-ANALYZE COMPILATE 

022 DEPROGNIFY1 TEMPLOC ENVCARCDR REGSLIST SET-UP-ASETVARS COMP-BODY PRODUCE- IF 

023 PRODUCE-ASET PRODUCE-LABELS PRODUCE-LAMBDA-COMBINATION PRODUCE-TRIVFN-COMBINATION 

024 PRODUCE-TRIVFN-COMBINATION-CONTINUATION PRODUCE-TRIVFN-COMBINATION-CVARIABLE 

025 PRODUCE -COMBINATION PRODUCE-COMBINATION-VARIABLE ADJUST-KNOWNFN-CENV 

026 PRODUCE-CONTINUATION-RETURN PRODUCE-RETURN PRODUCE-RETURN- 1 LAMBDACATE PSETQIFY 

027 PSETQIFY-METHOD-2 PSETQIFY-METHOD-3 PSETQ-ARGS PSETQ-ARGS-ENV PSETQ-TEMPS 

028 MAPANALYZE ANALYZE ANALYZE -CLAMBDA ANALYZE-CONTINUATION ANALYZE-CIF ANALYZE-CLABELS 

029 ANALYZE ^COMBINATION ANALYZE-RETURN LOOKUPICATE CONS-CLOSEREFS OUTPUT-ASET 

030 CONDICATE DECARCDRATE TRIVIALIZE TRIV-LAMBDACATE COMPILATE-ONE-FUNCTION 

031 COMPILATE-LOOP USED-TEMPLOCS REMARK-ON MAP-USER-NAMES COMFILE TRANSDUCE 

032 PROCESS-FORM PROCESS-DEF INE-FORM PROCESS-DEFINITION CLEANUP SEXPRFY CSEXPRFY 

033 CHECK-NUMBER-OF-ARGS DUMPIT STATS RESET-STATS INIT-RABBIT) ) 
034 

035 (DECLARE (SPECIAL *EMPTY* *GENTEMPNUM* *GENTEMPLIST* *GLOBAL-GEN-PREF IX* *ERROR-COUNT* *ERROR-LIST* 

036 *TEST* *TESTING* *OPTIMIZE* *REANALYZE* *SUBSTITUTE* *FUDGE* *NEW-FUDGE* 

037 *SINGLE-SUBST* *LAMBDA-SUBST* *FLUSH-ARGS* *STAT-VARS* *DEAD-COUNT* *FUDGE-COUNT* 

038 *FOLD-COUNT* *FLUSH-COUNT* *CONVERT-COUNT* *SUBST-COUNT* *DEPROGNIFY-COUNT* 

039 *LAMBDA-BODY-SUBST* *LAMBDA-BO0Y-SUBST-TRY-COUNT* *LAMBDA-BODY-SUBST-SUCCESS-COUNT* 

040 *CHECK-PEFFS* **CONT+ARG-REGS** **ENV+CONT+ARG-REGS** **ARGUMENT-REGISTERS** 

041 **NUMBER-OF-ARG-REGS** *BUFFER-RANDOM-FORMS* *DISPLACE-SW*) ) 
042 

043 (PROCLAIM (*EXPR PRINT-SHORT) 

044 (SET' *BUFFER-RANDOM-FORMS* NIL) 

045 (ALLOC '(LIST (240000 340000 1000) FIXNUM (30000 40000 1000) 

04 6 SYMBOL (14000 24000 NIL) HUNK4 (20000 53000 NIL) 

047 HUNK8 (20000 50000 NIL) HUNK16 (20000 60000 NIL)))) 
048 

049 (SET' *STAT-VARS* • (*DEAD-COUNT* *FUDGE-COUNT* *FOLD-COUNT* *FLUSH-COUNT* *CONVERT-COUNT* 

050 *SUBST-COUNT* *DEPROGNIFY-COUNT* *LAMBDA-BODY-SUBST-TRY-COUNT* 

051 *LAMBDA-BODY-SUBST-SUCCESS-COUNT*)) 
052 

053 (ALLOC '(LIST (240000 340000 1000) FIXNUM (30000 40000 1000) 

054 SYMBOL (14000 24000 NIL) HUNK4 (20000 50000 NIL) 

055 HUNK8 (20000 50000 NIL) HUNK16 (20000 70000 NIL))) 
056 

057 (APPLY 'GCTWA '(T)) ;GC USELESS ATOMS (CAN'T SAY (EVAL' (GCTWA T)) BECAUSE OF NCOMPLR) 

058 (REPLACE) ; UNDO ANY DISPLACED MACROS 

059 (SET* *DISPLACE-SW* NIL) ;DON'T LET MACROS SELF-DISPLACE 

060 (GRINDEF) ; LOAD THE GRINDER (PRETTY-PRINTER) 
061 

062 (DECLARE (/PDEFINE DEFINE |SCHEME FUNCTION|)) ;DECLARATIONS FOR LISTING PROGRAM 

063 (DECLARE (/gDEFINE DEFMAC |MACLISP MACRO|)) 

064 (DECLARE (/PDEFINE SCHMAC |PDP-10 SCHEME MACRO|)) 

065 (DECLARE (/^DEFINE MACRO |SCHEME MACRO |)') 
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The variable *ENPTY* is initialized to a unique object (a list cell whose 
car is *EMPTY* -- this is so that no other object can be EQ to it, but it can be 
easily recognized when printed) which is used to initialize components of 
structures. (We will see later how such structures are defined.) We do not use, 
say, NIL to represent an empty component because NIL might be a meaningful value 
for that component. The predicate EMPTY is true of the unique object. 

TRIVFN is a predicate which is true of "trivial" functions. A function 
is trivial if it is a MacLISP primitive (an EXPR, SUBR, or LSUBR), or has been 
declared to be primitive via a *EXPR or *LEXPR proclamation. 

(INCREMENT F00) expands into the code (ASET' F00 (+ F00 1)). 

CATENATE is a utility macro which may be thought of as a function. Given 
any number of S-expressions it produces an atomic symbol whose print name is the 
concatenation of the print names of the S-expressions. Usually the S-expressions 
will be atomic symbols or numbers. 

(CATENATE 'F00 '- 43) r> FOO-43 

GENTEMP is used to generate a new unique symbol, given a specified 
prefix. The global variable *GENTEMPNUM* starts at zero and increases 
monotonicially. Each call to GENTEMP catenates the prefix, a hyphen, and a new 
value of *GENTEMPNUM*. Because the numeric suffixes of the generated symbols 
increase with time, one can determine in which order symbols were generated. We 
also will use different prefixes for different purposes, so that one can tell 
which part of the compiler generated a given symbol. This information can be 
invaluable for debugging purposes; from the names of the symbols appearing in a 
data structure, one can determine how that structure was created and in what 
order. (The generated symbols are themselves used primarily as simple markers, 
or as simple structures (property lists). The use of the print names amounts to 
tagging each marker or structure with a type and a creation timestamp. A LISP- 
like language encourages the inclusion of such information.) 

(GENTEMP 'NODE) => NODE-2534 

A list of all generated symbols is maintained in *GENTEMPLIST*. GENFLUSH 
can be called to excise all generated symbols from the MacLISP obarray; this is 
periodically necessary when compiling a large file so that unneeded symbols may 
be garbage-collected. The symbols are initially interned on the obarray in the 
first place for ease of debugging (one can refer to them by name from a debugging 
breakpoint). GEN-GLOBAL-NAME is used to generate a symbol to be used as a run- 
time name by the compiled code. The prefix for such names is initially "?" for 
testing purposes, but is initialized by the file transducer as a function of the 
name of the file being compiled. This allows separately compiled files to be 
loaded together without fear of naming conflicts. 



001 RABBIT 568 05/1 5/78 Page 2 

002 (COND ((NOT (BOUNDP '*EMPTY*)) ~ ~ 

003 (SET* *EMPTY* (LIST '*EMPTY*)))) 
004 

005 (DEFINE EMPTY 

006 (LAMBDA (X) (EQ X *EMPTY*))) 
007 

008 

009 (DEFINE TRIVFN 

010 (LAMBDA (SYM) 

OH (GETL SYM '(EXPR SUBR LSUBR *EXPR *LEXPR)))) 

012 

013 

014 (DEFMAC INCREMENT (X) "(ASET' t X (+ ,X 1))) 

015 

016 (DEFMAC CATENATE ARGS 

017 "(IMPLODE (APPEND e(MAPCAR '(LAMBDA (X) 

018 (COND ((OR (ATOM X) (NOT (EQ (CAR X) 'QUOTE))) 

019 "(EXPLODEN ,X)) 

020 (T "(QUOTE .(EXPLODEN (CADR X)))))) 

021 ARGS)))) 
022 

023 

024 (COND ((NOT (BOUNDP ' *GENTEMPNUM* )) 

025 (SET' *GENTEMPNUM* 0))) 
026 

027 (COND ((NOT (BOUNDP '*GENTEMPLIST* ) ) 

028 (SET' *GENTEMPLIST* NIL))) 
029 

030 (DEFINE GENTEMP 

031 (LAMBDA (X) 

032 (BLOCK (INCREMENT *GENTEMPNUM*) 

033 (LET ((SYM (CATENATE X '|-| *GENTEMPNUM*))) 

034 (ASET' *GENTEMPLIST* (CONS SYM *GENTEMPLIST*) ) SYM)))) 

036 (DEFINE GENFLUSH 

037 (LAMBDA () 

038 (BLOCK (AMAPC REMOB *GENTEMPLIST*) 
°39 (ASET' *GENTEMPLIST* NIL)))) 
040 

041 (DEFINE GEN-GLOBAL-NAME 

042 (LAMBDA () (GENTEMP *GLOBAL-GEN-PREFIX*))) 
043 

044 (SET' *GLOBAL-GEN-PREFIX* '|?|) 
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WARN is a macro used to print a notice concerning an incorrect program 
being compiled. It generates a call to PRINT-WARNING, which maintains a count 
and a list of the error messages, and prints the message, along with any 
associated useful quantities. 

(WARN |FOO is greater than BAR| F00 BAR) 

would print (assuming the values of F00 and BAR were 43 and 15) 

; Warning: FOO is greater than BAR 
; 43 
; 15 

WARN is used only to report errors in the program being compiled. The MacLISP 
ERROR function is used to signal internal inconsistencies in the compiler. 

ASK is a macro which prints a message and then waits for a reply. 
Typically NIL means "no", and anything else means "yes". 

SX and CSX are debugging aids which print intermediate data structures 
internal to the compiler in a readable form. They make use of SPRINTER (part of 
the MacLISP GRIND pretty-printing package) and of SEXPRFY and CSEXPRFY, which are 
defined below. 

The EQCASE macro provides a simple dispatching control structure. The 
first form evaluates to an item, and the clause whose keyword matches the item is 
executed. If no clause matches, an error occurs. For example: 

(EQCASE TRAFFIC-LIGHT 

(RED (PRINT 'STOP)) 
(GREEN (PRINT 'GO)) 
(YELLOW (PRINT 'ACCELERATE) (CRASH))) 

expands into the code: 

(COND ((EQ TRAFFIC-LIGHT 'RED) (PRINT 'STOP)) 
((EQ TRAFFIC-LIGHT 'GREEN) (PRINT 'GO)) 
((EQ TRAFFIC-LIGHT 'YELLOW) (PRINT 'ACCELERATE) (CRASH)) 
(T (ERROR 'ILosing EQCASEI TRAFFIC-LIGHT 'FAIL-ACT))) 
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002 (DEFMAC WARN (MSG . STUFF) ~ " 

003 "(PRINT-WARNING ' ,MSG (LIST BSTUFF))) 
004 

005 (DEFINE PRINT-WARNING 

006 (LAMBDA (MSG STUFF) 

007 (BLOCK (INCREMENT *ERROR-COUNT* ) 

008 (ASET' *ERROR-LIST* (CONS (CONS MSG STUFF) *ERROR-LIST* ) ) 

009 (TYO 7 (SYMEVAL 'TYO)) ;BELL 

010 (TERPRI (SYMEVAL 'TYO)) 

011 (PRINC 'hWarning: | (SYMEVAL 'TYO)) 

012 (TYO 7 (SYMEVAL 'TYO)) ;BELL 

013 (PRINC MSG (SYMEVAL 'TYO)) 

014 (AMAPC PRINT-SHORT STUFF)))) 
015 

016 (DEFUN PRINT-SHORT (X) 

017 ((LAMBDA (PRINLEVEL PRINLENGTH TERPRI) 

018 (TERPRI (SYMEVAL 'TYO)) 

°19 (PRINC '|; | (SYMEVAL 'TYO)) 

020 (PRIN1 X (SYMEVAL 'TYO))) 

021 3 8 T)) 
022 

023 

024 (SCHMAC ASK (MSG) 

025 "(BLOCK (TERPRI) (PRINC '.MSG) (TYO 40) (READ))) 
026 

027 

028 (DEFMAC SX (X) "(SPRINTER (SEXPRFY ,X NIL))) ;DEBUGGING AID 

029 (DEFMAC CSX (X) "(SPRINTER (CSEXPRFY .X))) ;DEBUGGING AID 
030 

031 

032 (DEFMAC EQCASE (OBJ . CASES) 

033 "(COND §(MAPCAR '(LAMBDA (CASE) 

034 (OR (ATOM (CAR CASE)) 

035 (ERROR 'iLosing EQCASE clause|)) 

036 "((EO .OBJ '.(CAR CASE)) *(C0R CASE))) 

3 7 CASES) 

038 (T (ERROR 'ILosing EOCASE| ,OBJ 'FAIL-ACT)))) 
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The next group of macros implement typed data structures with named 
components. ACCESSFN, CLOBBER, and HUNKFN allow definition of very general 
structure access functions. Their precise operation is not directly relevant to 
this exposition; suffice it to say that they are subsidiary to the DEFTYPE macro 
on the next page. 



DEFTYPE defines structure "data types" with named components. These 
structures are implemented as MacLISP hunks. (A hunk is essentially a kind of 
list cell with more than two pointer components; it may be thought of as a 
short, fixed-length vector. Hunks are accessed with the function (CXR n hunk), 
which returns the nth component of the hunk. (RPLACX n hunk newval) analogously 
alters the nth component. CXR and RPLACX are thus similar to CAR/CDR and 
RPLACA/RPLACD. ) 

Slot of each hunk is reserved for a "property list"; this feature is 
not used in RABBIT. Slot 1 always contains an atomic symbol which is the name of 
the type. Thus every structure explicitly bears its type. The form (HUNKFN TYPE 
1) creates a function (actually a macro) called TYPE which when applied to a hunk 
will fetch slot 1. Slots 2 upward of a hunk are used to contain named 
components. A structure does not contain the component names. (However, the 
symbol which is the name of the type does have a list of the component names on 
its property list. This is useful for debugging purposes. There is, for 
example, a package which pretty-prints structured data types, showing the 
components explicitly as name-value pairs, which uses this information.) 
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033 



(DECLARE (/^DEFINE ACCESSFN |ACCESS MACRO|)) 
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(OEFMAC ACCESSFN (NAME UVARS FETCH . PUT) 
((LAMBDA (VARS CNAME) 

(DO ((A VARS (CDR A)) 

(B '*Z* "(CDR ,B)) 
(C NIL (CONS "(CAR ,B) C))) 
((NULL A) 
"(PROGN 'COMPILE 

(DEFMAC .NAME *Z* 

((LAMBDA .(NREVERSE (COR (REVERSE VARS))) 
.FETCH) 
P( REVERSE (CDR C)))) 
(DEFMAC .CNAME *Z* 

((LAMBDA .VARS 

.(COND (PUT (CAR PUT)) 

(T ""(CLOBBER , .FETCH 

.THE-NEW-VALUE)))) 
^(REVERSE C))))))) 
(COND (PUT UVARS) 

(T (APPEND UVARS '(THE-NEW-VALUE)))) 
(CATENATE '|CLOBBER-| NAME))) 

(DEFMAC CLOBBER (X Y) 

"(.(CATENATE '|CLOBBER-| (CARX)) ?(C0R X) ,Y)) 

(DECLARE (/^DEFINE HUNKFN |HUNK ACCESS MACROj)) 

(DEFMAC HUNKFN (NAME SLOT) 

"(ACCESSFN .NAME (THE-HUNK NEW-VALUE) 
"(CXR ..SLOT .THE-HUNK) 
"(RPLACX ..SLOT , THE-HUNK .NEW-VALUE))) 
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Consider for example the form 

(DEFTYPE LAMBDA (UVARS VARS BODY)) 

This defines a structured data type called LAMBDA with three named components 
UVARS, VARS, and BODY. It also defines a series of macros for manipulating this 
data type. 

For access, the macros LAMBDA\UVARS, LAMBDA\VARS, and LAMBDA\BODY are 
defined. These each take a single argument, a data structure of type VARIABLE, 
and return the appropriate component. (The TYPE function can also be applied to 
the data object, and will return LAMBDA.) 

For construction, a macro CONS-LAMBDA is defined. For example, the form: 

(CONS-LAMBDA (UVARS = LIST1) 
(VARS = LIST2)) 

would construct a LAMBDA structure with the TYPE, UVARS, VARS, and BODY slots 
initialized respectively to LAMBDA, the value of LIST1, the value of LIST2, and 
the "empty object" (recall the EMPTY predicate above). Any component names 
(possibly none!) may be initialized in a CONS-xxx form, and any components not 
mentioned will be initialized to the empty object. (The "=" signs are purely 
syntactic sugar for mnemonic value. They can be omitted.) 

For alteration of components, a macro ALTER-LAMBDA is defined. For 
example, the form 

(ALTER-LAMBDA FOO 

(UVARS := LIST1) 
(BODY := (LIST A B))) 

would alter the UVARS and BODY components of the value of FOO (which should be a 
LAMBDA structure - this is not checked) to be respectively the values of LIST1 
and (LIST A B). Any non-zero number of components may be modified by a single 
ALTER-xxx form. (The " : = " signs are purely syntactic sugar also.) 

A great advantage of using these structure definitions is that it is very 
easy to add or delete components during the development of the program. In 
particular, when a new component is added to a type, it is not necessary to find 
all instances of creations of objects of that type; they will simply 
automatically initialize the new slot to the empty object. Only parts of the 
program which are relevant to the use of the new component need be changed. 
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(DECLARE (/§DEFINE DEFTYPE |DATA TYPE | ) ) — 

;;; SLOT IS ALWAYS THE PROPERTY LIST, AND SLOT 1 THE HUNK TYPE. 
(HUNKFN TYPE 1) 

(DEFMAC DEFTYPE (NAME SLOTS SUPP) 
"(PROGN 'COMPILE 

(DEFMAC .(CATENATE '|CONS-| NAME) KWDS 
(PROGN (DO ((K KWDS (CDR K))) 
((NULL K)) 

(OR ,(COND ((CDR SLOTS) "(MEMO (CAAR K) '.SLOTS)) 
(T "(EQ (CAAR K) '.(CAR SLOTS)))) 
(ERROR ', (CATENATE '| Invalid Keyword Argument to CONS-| 
NAME) 
(CAR K) 
'FAIL-ACT))) 
"(HUNK '.'.NAME 

8(DO ((S ', SLOTS (CDR S)) 
(X NIL 

(CONS ((LAMBDA (KWD) 

(COND (KWD (CAR (LAST KWD))) 
(T '*EMPTY*))) 
(ASSQ (CAR S) KWDS)) 
X))) 
((NULL S) (NREVERSE X))) 
NIL))) 
(DEFMAC .(CATENATE ' | ALTER- | NAME) (OBJ . KWDS) 
(PROGN (DO ((K KWDS (CDR K))) 
((NULL K)) 

(OR ,(COND ((CDR SLOTS) "(MEMO (CAAR K) '.SLOTS)) 
(T "(EQ (CAAR K) '.(CAR SLOTS)))) 
(ERROR '.(CATENATE '| Invalid Keyword Argument to ALTER- 1 
NAME) 
(CAR K) 
•FAIL-ACT))) 
(DO ((I (+ (LENGTH KWDS) 1) (- I 1)) 
(VARS NIL (CONS (GENSYM) VARS))) 
((= I 0) 
"((LAMBDA .VARS 

.(BLOCKIFY 
(MAPCAR '(LAMBDA (K V) 

"(CLOBBER (.(CATENATE • .NAME 

(CAR K)) 
(.(CAR VARS))) 
(.V))) 
KWDS 

(CDR VARS)))) 
(LAMBDA () .OBJ) 

*(MAPCAR '(LAMBDA (K) "(LAMBDA () .(CAR (LAST K)))) 
KWDS)))))) 
P(DO ((S SLOTS (CDR S)) 
(N 2 (+ N 1)) 

(X NIL (CONS "(HUNKFN .(CATENATE NAME ' |\| (CAR S)) 
.N) 
X))) 
((NULL S) (NREVERSE X))) 
(DEFPROP .NAME .SLOTS COMPONENT- NAMES) 
(DEFPROP .NAME .SUPP SUPPRESSED-COMPONENT-NAMES) 
'(TYPE .NAME DEFINED))) 
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On this page are two groups of utility functions. One group manipulates 
property lists, and the other manipulates sets of objects represented as lists. 

For (ADDPROP SYM VAL PROP), the PROP property of the symbol SYM should be 
a list of things. The object VAL is added to this list if it is not already a 
member of the list. 

DELPROP performs the inverse of ADDPROP; it removes an object from a 
list found as the property of a symbol. 

(SETPROP SYN VAL PROP) puts the property-value pair PROP, VAL on the 
property list of SYN; but if SYM already has a PROP property, it is an error 
unless the new value is the same as (EQ to) the existing one. That is, a 
redundant SETPROP is permitted, but not a conflicting one. 

(ADJOIN ITEM SET) produces a new set SET U {ITEM}. 
UNION produces the union of two sets. 
INTERSECT produces the intersection of two sets. 
(REMOVE ITEM SET) produces a new set SET - {ITEM}. 
(SETDIFF SET1 SET2) produces the set SET1 - SET2. 

All of the set operations are accomplished non-destructively; that is, 
the given arguments are not modified. Examples: 

(ADJOIN 'A '(A B C)) => (A B C) 
(ADJOIN 'A »(B C D)) => (A B C D) 
(UNION '(A B C) '(B D F)) => (D F A B C) 
(INTERSECT '(A B C) '(B D F)) => (B) 
(REMOVE 'B '(A B C)) => (A C) 
(SETDIFF '(A B C) «(B D F)) => (A C) 
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(DEFINE ADDPROP 

(LAMBDA (SYM VAL PROP) 

(LET ((L (GET SYM PROP))) 

(IF (NOT (MEMO VAL L)) 

(PUTPROP SYM (CONS VAL L) PROP))))) 

;;; INVERSE OF ADDPROP 

(DEFINE DELPROP 

(LAMBDA (SYM VAL PROP) 

(PUTPROP SYM (DELO VAL (GET SYM PROP)) PROP))) 

;;; LIKE PUTPROP, BUT INSIST ON NOT CHANGING A VALUE ALREADY THERE 

(DEFINE SETPROP 

(LAMBDA (SYM VAL PROP) 

(LET ((L (GETL SYM (LIST PROP)))) 

(IF (AND L (NOT (EQ VAL (CADR L)))) 

(ERROR ' |Attempt to redefine a unique property! 
(LIST 'SETPROP SYM VAL PROP) 
•FAIL-ACT) 
(PUTPROP SYM VAL PROP))))) 

;;; OPERATIONS ON SETS/REPRESENTED AS LISTS 

(DEFINE ADJOIN 

(LAMBDA (X S) 

(IF (MEMO X S) S (CONS X S)))) 

(DEFINE UNION 

(LAMBDA (X Y) 

(DO ((Z Y (CDR Z)) 

(V X (ADJOIN (CAR Z) V))) 
((NULL Z) V)))) 

(DEFINE INTERSECT 

(LAMBDA (X Y) 

(IF (NULL X) 
NIL 
(IF (MEMQ (CAR X) Y) 

(CONS (CAR X) (INTERSECT {CDR X) Y)) 
(INTERSECT (CDR X) Y))))) 

(DEFINE REMOVE 

(LAMBDA (X S) 

(IF (NULL S) 
S 

(IF (EQ X (CAR S)) 
(CDR S) 
((LAMBDA (Y) 

(IF (EQ Y (CDR S)) S 
(CONS (CAR S) Y))) 
(REMOVE X (CDR S))))))) 

(DEFINE SETDIFF 

(LAMBDA (X Y) 

(DO ((Z X (CDR Z)) 

(W NIL (IF (MEMQ (CAR Z) Y) 
W 

(CONS (CAR Z) V)))) 
((NULL Z) W)))) 
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The PAIRLIS function is similar to, but not identical to, the function of 
the same name in the LISP 1.5 Manual. The difference is that the pairs of the 
association list produced are Z-lists rather than single conses. This was done 
purely so that structures produced by PAIRLIS would be more readable when 
printed; the ease of debugging was considered worth the additional CONS and 
access time. 

(PAIRLIS '(A B C) '(X Y Z) '((F P) (G Q))) 
= > ((C Z) (B Y) (A X) (F P) (G Q)) 

The COMPILE function is the main top-level function of the compiler. It 
is responsible for invoking each phase of the compiler in order. NAME is the 
name of a function (an atomic symbol), and LAMBDA-EXP the corresponding lambda- 
expression; these are easily extracted, for example, from a SCHEME DEFINE-form. 
SEE-CRUD is NIL for normal processing, or T for debugging purposes. OPTIMIZE is 
a switch controlling whether the optimization phase should be invoked; it can be 
T, NIL, or MAYBE (meaning to ask the (human) debugger). 

The overall flow within COMPILE is as follows: check number of 
arguments; apply ALPHATIZE to the lambda-expression to produce the pass 1 data 
structure; optionally optimize this data structure; perform pass 1 analysis; 
convert the pass 1 data structure to a pass 2 (continuation-passing style) data 
structure; perform pass 2 analysis; generate code. The value of COMPILE is the 
MacLISP code produced by the code generator. 

PASS1 -ANALYZE is a separate function so that it can be used by the 
optimizer to re-analyze newly created subexpressions. 

CL is a debugging utility. (CL FOO) causes the function FOO (which 
should be defined in the running SCHEME into which the compiler has been loaded) 
to be compiled. Various debugging facilities, such as SEE-CRUD, are enabled. 
This is done by using TEST-COMPILE. 
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RABBIT 568 
(DEFINE PA1RLIS 

(LAMBDA (LI L2 L) 

(DO ((V LI (CDR V)) 
(U L2 (CDR U)) 

(E L (CONS (LIST (CAR V) (CAR U)) E))) 
((NULL V) E)))) 



(DEFINE COMPILE 

(LAMBDA (NAME LAMBDA-EXP SEE-CRUD OPTIMIZE) 
(BLOCK (CHECK-NUMBER-OF-ARGS NAME 

(LENGTH (CADR LAMBDA-EXP)) 
T) 
(LET ((ALPHA-VERSION (ALPHATIZE LAMBDA-EXP NIL))) 

(IF (AND SEE-CRUD (ASK |See alpha-conversion?! ) ) 

(SX ALPHA-VERSION)) 
(LET ((OPT (IF (EQ OPTIMIZE 'MAYBE) 
(ASK |Optimize?|) 
OPTIMIZE))) 
(LET ((META-VERSION 
(IF OPT 

(META-EVALUATE ALPHA-VERSION) 
(PASS1-ANALYZE ALPHA-VERSION NIL NIL)))) 
(OR (AND (NULL (NODEXREFS META-VERSION)) 

(NULL (NODEXASETS META-VERSION))) 
(ERROR ' |ENV-ANALYZE lost - COMPILE! 
NAME 

'FAIL-ACT)) 
(IF (AND SEE-CRUD OPT (ASK |See meta-evaluation? | ) ) 

(SX META-VERSION)) 
(LET ((CPS-VERSION (CONVERT META-VERSION NIL (NOT (NULL OPT))))) 
(IF (AND SEE-CRUD (ASK |See CPS-conversion? | ) ) 

(CSX CPS-VERSION)) 
(CENV-ANALYZE CPS-VERSION NIL NIL) 
(BIND-ANALYZE CPS-VERSION NIL NIL) 
(DEPTH-ANALYZE CPS-VERSION 0) 
(CLOSE -ANALYZE CPS-VERSION NIL) 
(COMPILATE-ONE-FUNCTION CPS-VERSION NAME)))))))) 

(DEFINE PASS1-ANALYZE 

(LAMBDA (NODE REDO OPT) 

(BLOCK (ENV-ANALYZE NODE REDO) 
(TRIV-ANALYZE NODE REDO) 
(IF OPT (EFFS-ANALYZE NODE REDO)) 
NODE ) ) ) 

(SCHMAC CL (FNNAME) "(TEST-COMPILE \FNNAME)) 

(DEFINE TEST-COMPILE 

(LAMBDA (FNNAME) 

(LET ((FN (GET FNNAME 'SCHEME 1FUNCTION) ) ) 
(COND (FN (ASET' *TESTING* T) 

(ASET' *TEST* NIL) ;PURELY TO RELEASE FORMER GARBAGE 
(ASET' *ERROR-COUNT* 0) 
(ASET' *ERROR-LIST* NIL) 

(ASET' *TEST* (COMPILE FNNAME FN T 'MAYBE)) 
(SPRINTER *TEST*) 

"(,(IF (ZEROP *ERROR-COUNT*) 'NO *ERROR-COUNT*) ERRORS)) 
(T "(.FNNAME NOT DEFINED)))))) 
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Here are the structured data types used for the pass 1 intermediate 
representation. Each piece of the program is represented as a NODE, which has 
various pieces of information associated with it. The FORM component is a 
structure of one of the types CONSTANT, VARIABLE, LAMBDA, IF, ASET, CATCH, 
LABELS, or COMBINATION. This structure holds information specific to a given 
type of program node, whereas the NODE structure itself holds information which 
is needed at every node of the program structure. (One may think of the FORM 
component as a PASCAL record variant.) 

The ALPHATIZE routine and its friends take the S-expression definition of 
a function (a lambda-expression) and make a copy of it using NODE structures. 
This copy, like the S-expression, is a tree. Subsequent analysis routines will 
all recur on this tree, passing information up and down the tree, either 
distributing information from parent node to child nodes, or collating 
information from child nodes to pass back to parent nodes. Some information must 
move laterally within the tree, from branch to branch; this is accomplished 
exclusively by using the property lists of symbols, usually those generated for 
renamings of variables (since all lateral information is associated with variable 
references - which is no accident!). 

The function NODIFY is used for constructing a node, with certain slots 
properly initialized. In particular, the METAP slot is initialized to NIL, 
indicating a node not yet processed by M ETA- EVALUATE; this fact will be used 
later in the optimizer. A name is generated for the node, and the node is put on 
the property list of the name. This property is for debugging purposes only; 
given the name of a node one can get the node easily. The name itself may also 
be used for another purpose by CONVERT- COMB I NAT ION, to represent the intermediate 
quantity which is the value of the form represented by the node. 
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HERE WE RENAME ALL VARIABLES, AND CONVERT THE EXPRESSION TO AN EQUIVALENT TREE-LIKE FORM 
WITH EXTRA SLOTS TO BE FILLED IN LATER. AFTER THIS POINT, THE NEW NAMES ARE USED FOR 
VARIABLES, AND THE USER NAMES ARE USED ONLY FOR ERROR MESSAGES AND THE LIKE. THE TREE-LIKE 
FORM WILL BE USED AND AUGMENTED UNTIL IT IS CONVERTED TO CONTINUATION-PASSING STYLE. 

WE ALSO FIND ALL USER-NAMED LAMBDA-FORMS AND SET UP APPROPRIATE PROPERTIES. 
THE USER CAN NAME A LAMBDA-FORM BY WRITING (LAMBDA (X) BODY NAME). 



DEFTYPE 



NODE ( 
NAME : 
SEXPR: 

ENV: 

REFS: 

ASETS: 

TRIVP: 

EFFS: 

AFFD: 

PEFFS 

PAFFD 

METAP 

SUBSTP 

FORM: 



NAME SEXPR ENV REFS ASETS TRIVP EFFS AFFD PEFFS PAFFD METAP SUBSTP FORM) (SEXPR)) 

A GENSYM WHICH NAMES THE NODE'S VALUE 

THE S-EXPRESSION WHICH WAS ALPHATIZED TO MAKE THIS NODE 

(USED ONLY FOR WARNING MESSAGES AND DEBUGGING) 

THE ENVIRONMENT OF THE NODE (USED ONLY FOR DEBUGGING) 

ALL VARIABLES BOUND ABOVE AND REFERENCED BELOW OR BY THE NODE 

ALL LOCAL VARIABLES SEEN IN AN ASET BELOW THIS NODE (A SUBSET OF REFS) 

NON-NIL IFF EVALUATION OF THIS NODE IS TRIVIAL 

SET OF SIDE EFFECTS POSSIBLY OCCURRING AT THIS NODE OR BELOW 

SET OF SIDE EFFECTS WHICH CAN POSSIBLY AFFECT THIS NODE OR BELOW 

ABSOLUTELY PROVABLE SET OF EFFS 

ABSOLUTELY PROVABLE SET OF AFFD 

NON-NIL IFF THIS NODE HAS BEEN EXAMINED BY THE META-EVALUATOR 
:FLAG INDICATING WHETHER META-SUBSTITUTE ACTUALLY MADE A SUBSTITUTION 

ONE OF THE BELOW TYPES 



(DEFTYPE CONSTANT (VALUE)) 

;VALUE: THE S-EXPRESSION VALUE OF THE CONSTANT 
(DEFTYPE VARIABLE (VAR GLOBALP)) 

VAR: THE NEW UNIQUE NAME FOR THE VARIABLE, GENERATED BY ALPHATIZE. 

THE USER NAME AND OTHER INFORMATION IS ON ITS PROPERTY LIST. 
GLOBALP: NIL UNLESS THE VARIABLE IS GLOBAL (IN WHICH CASE VAR IS THE ACTUAL NAME) 
(DEFTYPE LAMBDA (UVARS VARS BODY)) 

;UVARS: THE USER NAMES FOR THE BOUND VARIABLES (STRICTLY FOR OEBUGGING (SEE SEXPRFY)) 
;VARS: A LIST OF THE GENERATED UNIQUE NAMES FOR THE BOUND VARIABLES 
;BODY: THE NODE FOR THE BODY OF THE LAMBDA-EXPRESSION 
(DEFTYPE IF (PRED CON ALT)) 

;PRED: THE NODE FOR THE PREDICATE 
;CON: THE NODE FOR THE CONSEQUENT 
;ALT: THE NODE FOR THE ALTERNATIVE 
(DEFTYPE ASET (VAR BODY GLOBALP)) 

VAR: THE GENERATED UNIQUE NAME FOR THE ASET VARIABLE 
BODY: THE NODE FOR THE BODY OF THE ASET 

GLOBALP: NIL UNLESS THE VARIABLE IS GLOBAL (IN WHICH CASE VAR IS THE ACTUAL NAME) 
(DEFTYPE CATCH (UVAR VAR BODY)) 

UVAR: THE USER NAME FOR THE BOUND VARIABLE (STRICTLY FOR DEBUGGING (SEE SEXPRFY)) 
VAR: THE GENERATED UNIQUE NAME FOR THE BOUND VARIABLE 
BODY: THE NODE FOR THE BODY OF THE CATCH 
(DEFTYPE LABELS (UFNVARS FNVARS FNDEFS BODY)) 

;UFNVARS: THE USER NAMES FOR THE BOUND LABELS VARIABLES 
;FNVARS: A LIST OF THE GENERATED UNIQUE NAMES FOR THE LABELS VARIABLES 
;FNDEFS: A LIST OF THE NODES FOR THE LAMBDA-EXPRESSIONS 
;BODY: THE NODE FOR THE BOY OF THE LABELS 
(DEFTYPE COMBINATION (ARGS WARNP)) 

;ARGS: A LIST OF THE NODES FOR THE ARGUMENTS (THE FIRST IS THE FUNCTION) 

;WARNP: NON-NIL IFF CHECK-COMBTNATION-PEFFS HAS DETECTED A CONFLICT IN THIS COMBINATION 

(DEFINE NODIFY 

(LAMBDA (FORM SEXPR ENV) 

(LET ((N (CONS-NODE (NAME = (GENTEMP 'NODE)) 
(FORM = FORM) 
(SEXPR = SEXPR) 
(ENV = ENV) 
(METAP = NIL)))) 
(PUTPROP. (NODENNAME N) N 'NODE) 
N))) 
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ALPHATIZE takes an S-expression to convert, and an environment. The 
latter is a list of 2-lists; each 2-list is of the form (user-name new-name). 
This is used for renaming each variable to a unique name. The unique names are 
generated within ALPHA-LAMBDA, ALPHA-LABELS, and ALPHA-CATCH, where the variable 
bindings are encountered. The new name pairings are tacked onto the front of the 
then-current environment, and the result used as the environment for converting 
the body. 

ALPHATIZE merely does a dispatch on the type of form, to one of the sub- 
functions for the various types. It also detects forms which are really macro 
calls, and expands them by calling MACRO-EXPAND, which returns the form to be 
used in place of the macro call. (BLOCK is handled as a separate special case. 
In the interpreter, BLOCK is handled specially rather than going through the 
general MACRO mechanism. This is done purely for speed. Defining BLOCK as a 
macro in the compiler can confuse the interpreter in which the compiler runs, and 
so it was decided simply to handle BLOCK as a special case in the compiler also.) 
ALPHATIZE allows the S-expression to contain already converted code in the form 
of NODEs; this fact is exploited by the optimizer (see META-IF-FUDGE below), but 
has no use in the initial conversion. 

ALPHA-ATOM creates a CONSTANT structure for numbers and the special 
symbols NIL and T. Otherwise a VARIABLE structure is created. If the symbol (it 
better be a symbol!) occurs in the environment, the new-name is used, and 
otherwise the symbol itself. The slot GLOBALP is set to T iff the symbol was not 
in the environment. 

ALPHA-LAMBDA generates new names for all the bound variables. It then 
converts its body, after using PAIRLIS to add the user-name/new-name pairs to the 
environment. The result is used to make a LAMBDA structure. A copy is made of 
the list of variables in the UVARS slot; it must be copied because later J1ETA- 
COMBINATION-LAMBDA may splice out elements of that list. If so, it will also 
splice out corresponding members of VARS, but that list was freshly consed by 
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002 ;;; ON NODE NAMES THESE PROPERTIES ARE CREATED: 

003 ;;; NODE THE CORRESPONDING NODE 
004 

005 (DEFINE ALPHATIZE 

006 (LAMBDA (SEXPR ENV) 

007 (COND ((ATOM SEXPR) 

008 (ALPHA-ATOM SEXPR ENV)) 

009 ((HUNKP SEXPR) 

010 (IF (EQ (TYPE SEXPR) 'NODE) 
OH SEXPR 

012 (ERROR ^Peculiar hunk - ALPHATIZE| SEXPR 'FAIL-ACT))) 

013 ((EQ (CAR SEXPR) 'QUOTE) 

014 (NOOIFY (CONS-CONSTANT (VALUE = (CADR SEXPR))) SEXPR ENV)) 

015 ((EQ (CAR SEXPR) 'LAMBDA) 
°1& (ALPHA-LAMBDA SEXPR ENV)) 

017 ((EQ (CAR SEXPR) 'IF) 

018 (ALPHA-IF SEXPR ENV)) 
°19 ((EQ (CAR SEXPR) 'ASET) 

020 (ALPHA-ASET SEXPR ENV)) 

021 ((EQ (CAR SEXPR) 'CATCH) 

022 (ALPHA-CATCH SEXPR ENV)) 

023 ((EQ (CAR SEXPR) 'LABELS) 

024 (ALPHA-LABELS SEXPR ENV)) 
°25 ((EQ (CAR SEXPR) 'BLOCK) 
°26 (ALPHA-BLOCK SEXPR ENV)) 
°27 ((AND (ATOM (CAR SEXPR)) 

°28 (EQ (GET (CAR SEXPR) 'AINT) 'AMACRO)) 

°29 (ALPHATIZE (MACRO-EXPAND SEXPR) ENV)) 

030 (T (ALPHA-COMBINATION SEXPR ENV))))) 
031 

032 (DEFINE ALPHA-ATOM 

033 (LAMBDA (SEXPR ENV) 

034 (IF (OR (NUMBERP SEXPR) (NULL SEXPR) (EQ SEXPR 'T)). 

035 (NODIFY (CONS-CONSTANT (VALUE = SEXPR)) SEXPR ENV) 

036 (LET ((SLOT (ASSQ SEXPR ENV))) 

037 (NODIFY (CONS-VARIABLE (VAR = (IF SLOT (CADR SLOT) SEXPR)) 

038 (GLOBALP = (NULL SLOT))) 

039 SEXPR 

04 ENV))))) 
041 

042 (DEFINE ALPHA-LAMBDA 

043 (LAMBDA (SEXPR ENV) 

044 (LET ((VARS (DO ((I (LENGTH (CADR SEXPR)) (- I 1)) 

045 (V NIL (CONS (GENTEMP 'VAR) V))) 

046 ((=10) (NREVERSE V))))) 

047 (IF (CDDDR SEXPR) 

048 (WARN |Ma1formed LAMBDA exprcssionl SEXPR)) 

049 (NODIFY (CONS-LAMBDA (UVARS = (APPEND (CADR SEXPR) NIL)) 

050 ;;SEE META-COMBINATION-LAMBDA 

051 (VARS = VARS) 

052 (BODY = (ALPHATIZE (CADDR SEXPR) 
°^ 3 (PAIRLIS (CADR SEXPR) 



054 
055 

056 SEXPR 

057 ENV)))) 



VARS 
ENV)))) 
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ALPHA- IF simply converts the predicate, consequent, and alternative, and 
makes an IF structure. 

ALPHA-ASET checks for a non-quoted first argument. (Presently RABBIT 
does not allow for computed ASET variables. Since RABBIT was written, such 
computed variables have in fact been banned from the SCHEME language [Revised 
Report].) For simplicity, it also does not allow altering a global variable 
which is the name of a MacLISP primitive. This restriction is related only to 
the kluclginess of the PDP-10 MacLISP SCHEME implementation, and is not an 
essential problem with the language. The ERROR function was used here rather 
than WARN because the problems are hard to correct for and occur infrequently. 
Aside from these difficulties, ALPHA-ASET is much like ALPHA-ATOM on a variable; 
it looks in the environment, converts the body, and then constructs an ASET 
structure. 

ALPHA-CATCH generates a new name "CATCHVAR-nn" for the bound variable, 
tacks it onto the environment, and converts the body; it then constructs a CATCH 
structure. 

ALPHA-LABELS generates new names "FNVAR-n" for all the bound variables; 
it then constructs in LENIV the new environment, using PAIRLIS. It then converts 
all the bound function definitions and the body, using this environment. In this 
way all the function names are apparent to all the functions. A LABELS structure 
is then created. 
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(DEFINE ALPHA- IF 

(LAMBDA (SEXPR ENV) 

(NODIFY (CONS-IF (PRED = (ALPHATIZE (CADR SEXPR) ENV)) 
(CON = (ALPHATIZE (CADDR SEXPR) ENV)) 
(ALT « (ALPHATIZE (CADDDR SEXPR) ENV))) 
SEXPR 
ENV))) 

(DEFINE ALPHA-ASET 

(LAMBDA (SEXPR ENV) 

(LET ((VAR (COND ((OR (ATOM (CADR SEXPR)) 

(NOT (EQ (CAADR SEXPR) 'QUOTE))) 
(ERROR '|Can't Compile Non-quoted ASET Variable) 
SEXPR 

'FAIL-ACT)) 
(T (CADADR SEXPR))))) 
(LET ((SLOT (ASSQ VAR ENV))) 

(IF (AND (NULL SLOT) (TRIVFN VAR)) 

(ERROR 'III legal to ASET a MacLISP primitive! 
SEXPR 

•FAIL-ACT)) 
(NODIFY (CONS-ASET (VAR = (IF SLOT (CADR SLOT) VAR)) 
(GLOBALP = (NULL SLOT)) 
(BODY = (ALPHATIZE (CADDR SEXPR) ENV))) 
SEXPR 
ENV))))) 

(DEFINE ALPHA-CATCH 

(LAMBDA (SEXPR ENV) 

(LET ((VAR (GENTEMP 'CATCHVAR))) 

(NODIFY (CONS-CATCH (VAR = VAR) 

(UVAR = (CADR SEXPR)) 

(BODY = (ALPHATIZE (CADDR SEXPR) 

(CONS (LIST (CADR SEXPR) VAR) 
ENV)))) 
SEXPR 
ENV)))) 

(DEFINE ALPHA-LABELS 

(LAMBDA (SEXPR ENV) 

(LET ((UFNVARS (AMAPCAR (LAMBDA (X) 

(IF (ATOM (CAR X)) 
(CAR X) 
(CAAR X))) 
(CADR SEXPR)))) 
(LET ((FNVARS (DO ((I (LENGTH UFNVARS) (- I 1)) 

(V NIL (CONS (GENTEMP 'FNVAR) V))) 
((= I 0) (NREVERSE V))))) 
(LET ((LENV (PAIRLIS UFNVARS FNVARS ENV))) 

(NODIFY (CONS-LABELS (UFNVARS = UFNVARS) 
(FNVARS = FNVARS) 
(FNDEFS = (AMAPCAR 

(LAMBDA (X) 

(ALPHA-LABELS-DEFN X LENV)) 
(CADR SEXPR))) 
(BODY * (ALPHATIZE (CADDR SEXPR) LENV))) 



SEXPR 
ENV)))))) 
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ALPHA-LABELS-DEFN parses one LABELS definition clause. An extension to 
the SCHEME language (made just after the publication of [Revised Report]!) 
allows a LABELS definition to take on any of the same three forms permitted by 
DEFINE. Thus this LABELS form actually defines FOO, BAR, and BAZ to be 
equivalent functions: 

(LABELS ((FOO (LAMBDA (X Y) (BLOCK (PRINT X) ( + X Y)))) 
(BAR (X Y) (PRINT X) (+ X Y)) 
((BAZ X Y) (PRINT X) (+ X Y))) 
(LIST (FOO 1 2) (BAR 1 2) (BAZ 1 2))) 

ALPHA-BLOCK implements the standard macro definition of BLOCK. (BLOCK x) 
is simply x, and (BLOCK x . y) expands into: 

((LAMBDA (A B) (B)) x (LAMBDA () (BLOCK . y))) 

MACRO-EXPAND takes a macro call and expands it into a new form to be used 
in place of the macro call. In the PDP-10 MacLISP SCHEME implementation there 
are three different kinds of macros. Types MACRO and AMACRO are defined by 
MacLISP code, and so their defining functions are invoked using the MacLISP 
primitive FUNCALL. Type SMACRO is defined by SCHEME code which is in the value 
cell of an atomic symbol; thus SYMEVAL is used to get the contents of the value 
cell, and this SCHEME function is then invoked. 

ALPHA-COMBINATION converts all the subforms of a combination, making a 
list of them, and creates a COMBINATION structure. If the function position 
contains a variable, it performs a consistency check using CHECK-NUMBER-OF-ARGS 
to make sure the right number of arguments is present. 
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(DEFINE ALPHA-LABELS-DEFN 

(LAMBDA (LDEF LENV) 

(ALPHATIZE (IF (ATOM (CAR LDEF)) 
(IF (CODR LDEF) 

"(LAMBDA ,(CADR LDEF) .(BLOCKIFY (CDDR LDEF))) 
(CADR LDEF)) 
"(LAMBDA ,(CDAR LDEF) .(BLOCKIFY (CDR LDEF)))) 
LENV))) 

(DEFINE ALPHA-BLOCK 

(LAMBDA (SEXPR ENV) 

(COND ((NULL (COR SEXPR)) 

(WARN | BLOCK with no f orms | 

"(ENV = .(AMAPCAR CAR ENV))) 
(ALPHATIZE NIL ENV)) 
(T (LABELS ((MUNG 

(LAMBDA (BODY) 

(IF (NULL (CDR BODY)) 
(CAR BODY) 

"( (LAMBDA (A B) (B)) 
.(CAR BODY) 

(LAMBDA () .(MUNG (CDR BODY)))))))) 
(ALPHATIZE (MUNG (CDR SEXPR)) ENV)))))) 

(DEFINE MACRO-EXPAND 

(LAMBDA (SEXPR) 

(LET ((M (GETL (CAR SEXPR) '(MACRO AMACRO SMACRO)))) 
(IF (NULL M) 

(BLOCK (WARN |missing macro def inition| SEXPR) 

"(ERROR '(Undefined Macro Form| ' .SEXPR 'FAIL-ACT)) 
(EQCASE (CAR M) 

(MACRO (FUNCALL (CADR M) SEXPR)) 
(AMACRO (FUNCALL (CADR M) SEXPR)) 
(SMACRO ((SYMEVAL (CADR M)) SEXPR))))))) 

(DEFINE ALPHA-COMBINATION 

(LAMBDA (SEXPR ENV) 

(LET ((N (NODIFY (CONS-COMBINATION 
(WARNP = NIL) 

(ARGS = (AMAPCAR (LAMBDA (X) (ALPHATIZE X ENV)) 
SEXPR))) 
SEXPR 
ENV))) 
(LET ((M (NODENFORM (CAR (COMBINATION\ARGS (NODE\FORM N)))))) 
(IF (AND (EQ (TYPE M) 'VARIABLE) 
(VARIABLE\GLOBALP M)) 
(CHECK-NUMBER-OF-ARGS 
(VARIABLEWAR M) 

(LENGTH (COR (COMBINATION\ARGS (NODE\FORM N)))) 
NIL)) 
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Once the S-expression function definition has been copied as a NODE tree, 
COMPILE calls PASS1-ANALYZE to fill in various pieces of information. (If 
optimization is to be performed, COMPILE instead calls META-EVALUATE. META- 
EVALUATE in turn calls PASS1-ANALYZE in a coroutining manner we will examine 
later.) PASS1-ANALYZE in turn calls ENV-ANALYZE, TRIV-ANALYZE, and EFFS-ANALYZE 
in order. Each of these has roughly the same structure. Each takes a node and a 
flag called REDOTHIS. Normally REDOTHIS is NIL and the information has not yet 
been installed in the node, and so the routine proceeds to analyze the node and 
install the appropriate information. 

When invoked by the optimizer, however, there may be information in the 
node already, but that information may be incorrect or obsolete as a result of 
the optimizing transformations. If REDOTHIS is non-NIL, then the given node must 
be reanalyzed, even if the information is already present. If REDOTHIS is in 
fact the symbol ALL, then all descendants of the given node must be reanalyzed. 
Otherwise, only the given node requires re-analysis, plus any descendants which 
have not had the information installed at all. We will see later how these 
mechanisms are used in the optimizer. 

The purpose of ENV-ANALYZE is to fill in for each node the slots REFS and 
ASETS. The first is a set (represented as a list) of the new-names of all 
variables bound above the node and referenced at or below the node, and the 
second (a subset of the first) is a set of such names which appear in an ASET at 
or below the node. These lists are computed recursively. A CONSTANT node has no 
such references; a VARIABLE node (with GLOBALP = NIL) refers to its own 
variable. An ASET node adds its variable to the ASET list for its body. Most 
other kinds of nodes merely merge together the lists for their immediate 
descendants. In order to satisfy the "bound above the node" requirement, those 
structures which bind variables (LAMBDA, CATCH, LABELS) filter out their own 
bound variables from the two sets. 

As an example, consider this function: 

(LAMBDA (X) 

((LAMBDA (Y) 

((LAMBDA (W) 

(ASET' Z (* X Y))) 
(ASET' Y (- Y 1)))) 
(- X 3))) 

The node for (- X 3) would have a REFS list (X) and an ASET list (). The node 
for the ASET on Z would have REFS=(X Y) (or perhaps (Y X)) and ASETS=(); Z does 
not appear in the ASETS list because it is not bound above. The node for the 
combination ((LAMBDA (W) ...) ...) would have REFS=(X Y) and ASETS=(Y). The 
node for the lambda-expression (LAMBDA (Y) ...) would have REFS=(X) and 
ASETS=(), because Y is filtered out. 
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FOR NODES ENCOUNTERED WE FILL IN: 

REFS 

ASETS 

ON VARIABLE NAMES THESE PROPERTIES ARE CREATED: 

BINDING THE NODE WHERE THE VARIABLE IS BOUND ' 

USER-NAME THE USER'S NAME FOR THE VARIABLE (WHERE BOUND) 

READ-REFS VARIABLE NODES WHICH READ THE VARIABLE 

WRITE-REFS ASET NODES WHICH SET THE VARIABLE 

NORMALLY. ON RECURRING TO A LOWER NODE WE STOP IF THE INFORMATION 
IS ALREADY THERE. MAKING THE PARAMETER "REDOTHIS" BE "ALL" FORCES 
RE-COMPUTATION TO ALL LEVELS; MAKING IT "ONCE" FORCES 
RECOMPUTATION OF THIS NODE BUT NOT OF SUBNODES. 

(DEFINE ENV-ANALYZE 

(LAMBDA (NODE REDOTHIS) 

(IF (OR REDOTHIS (EMPTY (NODE\REFS NODE))) 
(LET ((FM (NODE\FORM NODE)) 

(REDO (IF (EQ REOOTHIS 'ALL) 'ALL NIL))) 
(EQCASE (TYPE FM) 
(CONSTANT 
(ALTER-NODE NODE 

(REFS := NIL) 
(ASETS := NIL))) 
(VARIABLE 
(ADDPROP (VARIABLEWAR FM) NODE 'READ-REFS) 
(IF (VARIABLE\GLOBALP FM) 

(SETPROP (VARIABLEWAR FM) (VARIABLEWAR FM) 'USER-NAME)) 
(ALTER-NODE NODE 

(REFS := (AND (NOT ( VARIABLE\GLOBALP FM)) 

(LIST (VARIABLEWAR FM)))) 
(ASETS := NIL))) 
(LAMBDA 
(DO ((V (LAMBDA WARS FM) (COR V)) 

(UV (LAMBDAXUVARS FM) (CDR UV))) 
((NULL V)) 

(SETPROP (CAR V) (CAR UV) 'USER-NAME) 
(SETPROP (CAR V) NODE 'BINDING)) 
(LET ((B (LAMB0A\BO0Y FM))) 
(ENV-ANALYZE B REDO) 
(ALTER-NODE NODE 

(REFS := (SETDIFF (NODE\REFS B) 

(LAMBDA WARS FM))) 
(ASETS 



(IF 



(SETDIFF (NODE\ASETS B) 

(LAMBDA WARS FM)))))) 



(LET ((PRED (IFXPRED FM)) 
(CON (IF\CON FM)) 
(ALT (IF\ALT FM))) 
(ENV-ANALYZE PRED REDO) 
(ENV-ANALYZE CON REDO) 
(ENV-ANALYZE ALT REDO) 
(ALTER-NODE NODE 
(REFS 



(ASET 

(LET ((B (ASET\BODY FM)) 
(V (ASETWAR FM))) 
(ENV-ANALYZE B REDO) 
(ADDPROP V NODE 'WRITE-REFS) 
(IF (ASET\GLOBALP FM) 
(ALTER-NODE NODE 



« (UNION (NODENREFS PRED) 

(UNION (NODENREFS CON) 

(NODENREFS ALT)))) 
(ASETS := (UNION (NODEXASETS PRED) 

(UNION (NODEXASETS CON) 

(NODEXASETS ALT))))))) 
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It should be easy to see the the topmost node of the node-tree must have 
REFS=() and ASETS=(), because no variables are bound above it. This fact is used 
in COMPILE for a consistency check. (After writing this last sentence, I noticed 
that in fact this consistency check was not being performed, and that it was a 
good idea. On being installed, this check immediately caught a subtle bug in the 
optimizer. Consistency checks pay off!) 

Another purpose accomplished by ENV-ANALYZE is the installation of 
several useful properties on the new-names of bound variables. Two properties, 
READ-REFS and WRITE-REFS, accumulate for each variable the set of VARIABLE nodes 
which refer to it and the set of ASET nodes that refer to it. These lists are 
very important to the optimizer. A non-empty WRITE-REFS set also calls for 
special action by the code generator. 

When a LAMBDA node is encountered, that node is put onto each new-name 
under the BINDING property, and the user-name is put under the USER-NAME 
property; these are used only for debugging. 
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070 (REFS '":«"( NODEXREFS B)) 

071 (ASETS := (NOOEXASETS B))) 

072 (ALTER-NO0E NODE 

073 (REFS := (ADJOIN V (NODE\REFS B))) 

074 (ASETS := (ADJOIN V (NODE\ASETS B))))))) 

075 (CATCH 

076 (LET ((B (CATCH\BODY FM)) 

077 (V (CATCHWAR FM))) 

078 (SETPROP V (CATCHXUVAR FM) 'USER-NAME) 

079 (SETPROP V NODE 'BINDING) 

080 (ENV-ANALYZE B REDO) 

081 (ALTER-NODE NODE 

082 (REFS := (REMOVE V (NODEXREFS B))) 

083 (ASETS := (REMOVE V (NOOEXASETS B)))))) 

084 (LABELS 

08 5 (DO ((V (LABELSXFNVARS FM) (CDR V)) 

086 (UV (LABELSXUFNVARS FM) (CDR UV)) 

087 ( D (LABELSXFNDEFS FM) (CDR D)) 

088 (R NIL (UNION R (NODE\REFS (CAR D)))) 
°89 (A NIL (UNION A (NODE\ASETS (CAR D))))) 

090 ((NULL V) 

091 (LET ((B (LABELSXBODY FM))) 

092 (ENV-ANALYZE B REDO) 

093 (ALTER-NODE NODE 

094 (REFS := (SETDIFF 

095 (UNION R (NODEXREFS B)) 

096 (LABELSXFNVARS FM))) 

097 (ASETS := (SETDIFF 

098 (UNION A (NODE\ASETS B)) 

099 (LABELSXFNVARS FM)))))) 

100 (SETPROP (CAR V) (CAR UV) 'USER-NAME) 

101 (SETPROP (CAR V) NODE 'BINDING) 

102 (ENV-ANALYZE (CAR D) REDO))) 

103 (COMBINATION 

1° 4 (LET ((ARGS (COMBINATIONXARGS FM))) 

105 (AMAPC (LAMBDA (X) (ENV-ANALYZE X REDO)) ARGS) 

1° 6 (DO ((A ARGS (CDR A)) 

107 (R NIL (UNION R (NOOEXREFS (CAR A)))) 

108 (S NIL (UNION S (NODEXASETS (CAR A))))) 

109 ((NULL A) 

11° (ALTER-NODE NODE 

HI (REFS := R) 

112 (ASETS :- S))))))))))) 
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TRIV-ANALYZE fills in the TRIVP slot for each node. This is a flag 
which, if non-NIL, indicates that the code represented by that node and its 
descendants is "trivial", i.e. it can be executed as simple host machine 
(MacLISP) code because no SCHEME closures are involved. Constants and variables 
are trivial, as are combinations with trivial arguments and a provably trivial 
function. While lambda-expressions are in general non-trivial (because a closure 
must be constructed), a special case is made for ((LAMBDA ...) ...), i.e. a 
combination whose function is a lambda-expression. This is possible because the 
code generator will not really generate a closure for the lambda-expression. 
This is the first example of a trichotomy we will encounter repeatedly. 
Combinations are divided into three kinds: those with a lambda-expression in the 
function position, those with a trivial MacLISP primitive (satisfying the 
predicate TRIVFN) in the function position, and all others. 

All other expressions are, in general, trivial iff all their subparts are 
trivial. Note that a LABELS is trivial iff its body is trivial; the non- 
triviality of the bound functions does not affect this. 

The triviality flag is used by phase 2 to control conversion to 
continuation-passing style. This in turn affects the code generator, which 
compiles trivial forms straightforwardly into MacLISP code, rather than using the 
more complex techniques required by non-trivial SCHEME code. It would be 
possible to avoid triviality analysis entirely; the net result would only be 
less optimal final code. 



A COMBINATION IS TRIVIAL IFF ALL ARGUMENTS ARE TRIVIAL, AND 
THE FUNCTION CAN BE PROVED TO BE TRIVIAL. WE ASSUME CLOSURES 
TO BE NON-TRIVIAL IN THIS CONTEXT, SOTHAT THE CONVERT FUNCTION 
WILL BE FORCED TO EXAMINE THEM. 
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002 ;;; TRIVIALITY ANALYSIS " 
003 

004 ;;; FOR NODES ENCOUNTERED WE FILL IN: 

005 ;;; TRIVP 
006 
007 
008 
009 
010 
Oil 

012 (DEFINE TRIV-ANALYZE 

013 (LAMBDA (NODE REDOTHIS) 

014 (IF (OR REDOTHIS (EMPTY (NODEXTRIVP NODE))) 

015 (LET ((FM (NODEXFORM NODE)) 

016 (REDO (IF (EQ REDOTHIS 'ALL) 'ALL NIL))) 

017 (EQCASE (TYPE FM) 

018 (CONSTANT 

019 (ALTER-NODE NODE (TRIVP := T))) 

020 (VARIABLE 

021 (ALTER-NODE NODE (TRIVP := T))) 

022 (LAMBDA 

023 (TRIV-ANALYZE (LAMBDA\BODY FM) REDO) 

024 (ALTER-NODE NODE (TRIVP := NIL))) 

025 (IF 

026 (TRIV-ANALYZE ( IF\PRED FM) REDO) 

027 (TRIV-ANALYZE ( IF\CON FM) REOO) 

028 (TRIV-ANALYZE (IF\ALT FM) REDO) 

029 (ALTER-NODE NODE 

030 (TRIVP := (AND (NODEXTRIVP ( IF\PRED FM)) 

031 (NODEXTRIVP (IF\CON FM)) 

032 (NODEXTRIVP (IF\ALT FM)))))) 

033 (ASET 

034 (TRIV-ANALYZE (ASET\BODY FM) REOO) 

03 5 (ALTER-NODE NODE (TRIVP := (NODEXTRIVP (ASETXBODY FM))))) 

036 (CATCH 

3 7 (TRIV-ANALYZE (CATCH\BODY FM) REDO) 

3 8 (ALTER-NODE NODE (TRIVP := NIL))) 

039 (LABELS 

040 (AMAPC (LAMBDA (F) (TRIV-ANALYZE F REDO)) 

041 (LABELSXFNDEFS FM)) 

042 (TRIV-ANALYZE (LABELS\BODY FM) REDO) 

043 (ALTER-NOOE NODE (TRIVP := NIL))) 

044 (COMBINATION 

045 (LET ((ARGS (COMBINATION\ARGS FM))) 

046 (TRIV-ANALYZE (CAR ARGS) REDO) 
°47 (DO ((A (CDR ARGS) (CDR A)) 

°48 (SW T (AND SW (NODEXTRIVP (CAR A))))) 

049 ((NULL A) 

050 (ALTER-NODE NODE 

051 (TRIVP :■ (AND SW 

052 (TRIV-ANALYZE-FN-P 

053 (CAR ARGS)))))) 

054 (TRIV-ANALYZE (CAR A) REDO))))))))) 
055 

056 (DEFINE TRIV-ANALYZE-FN-P 

057 (LAMBDA (FN) 

058 (OR (AND (EQ (TYPE (NODE\FORM FN)) 'VARIABLE) 

059 (TRIVFN ( VARIABLEWAR (NODE\FORM FN)))) 

060 (AND (-EQ (TYPE (NODE\FORM FN)) 'LAMBDA) 

°61 (NODEXTRIVP (LAMBDA\BODY (NODEXFORM FN))))))) 
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EFFS-ANALYZE analyzes the code for side-effects. In each node the four 
slots EFFS, AFFD, PEFFS, and PAFFD are filled in. Each is a set of side effects, 
which may be the symbol NONE, meaning no side effects; ANY, meaning all possible 
side effects; or a list of specific side effect names. Each such name specifies 
a category of possible side effects. Typical names are ASET, RPLACD, and FILE 
(which means input/output transactions). 

The four slots EFFS, AFFD, PEFFS, and PAFFD refer to the node they are in 
and all nodes beneath it. Thus each is computed by taking the union of the 
corresponding sets of all immediate descendants, then adjoining any effects due 
to the current node. 

EFFS is the set of side effects which may possibly be caused at or below 
the current node; PEFFS is the set of side effects which can be proved to occur 
at or below the node. These may differ because of ignorance on RABBIT'S part. 
For example, the node for a combination (RPLACA A B) will have the side-effect 
name RPLACA adjoined to both EFFS and PEFFS, because the RABBIT knows that RPLACA 
causes an RPLACA side effect (how this is known will be discussed later). On the 
other hand, for a combination (F00 A B), where F00 is some user function, RABBIT 
can only conjecture that F00 can cause any conceivable side effect, but cannot 
prove it. Thus EFFS will be forced to be ANY, while PEFFS will not. 

AFFD is the set of side effects which can possibly affect the evaluation 
of the current node or its descendants. For example, an RPLACA side effect can 
affect the evaluation of (CAR X), but on the other hand an RPLACD side effect 
cannot. PAFFD is the corresponding set of side effects for which it can be 
proved. (This set is "proved" in a less rigorous sense than for PEFFS. The name 
RPLACA would be put in the PAFFD set for (CAR X), even though the user might know 
that while there are calls to RPLACA in his program, none of them ever modify X. 
PEFFS and PAFFD are only used by CHECK-COMBINATION-PEFFS to warn the user of 
potential conflicts anyway, and serve no other purpose. EFFS and AFFD, on the 
other hand, are used by the optimizer to prevent improper code motion. Thus EFFS 
and AFFD must be pessimistic, and err only on the safe side; while PEFFS and 
PAFFD are optimistic, so that the user will not be pestered with too many warning 
messages . ) 

The CONS side effect is treated specially. A node which causes the CONS 
side effect must not be duplicated, because each instance will create a new 
object; but whereas two RPLACA side effects may not be executed out of order, 
two CONS side effects may be. 

The computation of AFFD and PAFFD for variables depends on whether the 
variable is global or not. If it is, SETQ and RPLACD can affect it (RPLACD can 
occur because of the peculiarities of the PDP-10 MacLISP implementation); 
otherwise, ASET can affect it if indeed any ASET refers to it (in which case ENV- 
ANALYZE will have left a WRITE-REFS property); otherwise, nothing can affect it. 
Similar remarks hold for the computation of EFFS and PEFFS for an ASET node. The 
name SETQ applies to modifications of global variables, while ASET applies to 
local variables. 
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SIDE-EFFECTS ANALYSIS 

FOR NODES ENCOUNTERED WE FILL IN: EFFS, AFFD, PEFFS, PAFFD 

A SET OF SIDE EFFECTS MAY BE EITHER 'NONE OR 'ANY, OR A SET. 



(DEFINE EFFS-ANALYZE 

(LAMBDA (NODE REDOTHIS) 

(IF (OR REDOTHIS (EMPTY (N0DE\EFFS NODE))) 
(LET ((FM (NODEXFORM NODE)) 

(REDO (IF (EO REOOTHIS 'ALL) 'ALL NIL))) 
(EQCASE (TYPE FM) 
(CONSTANT 
(ALTER-NODE 



NODE 
(EFFS : 
(AFFD : 
(PEFFS 
(PAFFD 



= -NONE) 
= 'NONE) 
:= 'NONE) 
:= 'NONE))) 
(VARIABLE 
(LET ((A (COND ( (VARIABLEXGLOBALP FM) '(SETQ)) 

((GET (VARIABLEWAR FM) 'WRITE-REFS) '(ASET)) 
(T 'NONE)))) 
(ALTER-NODE NODE 

(EFFS := 'NONE) 
(AFFD := A) 
(PEFFS := 'NONE) 
(PAFFD := A)))) 
(LAMBDA 
(EFFS-ANALYZE (LAMBDA\BODY FM) REDO) 
(ALTER-NODE NODE 

(EFFS := '(CONS)) 
(AFFD := NIL) 
(PEFFS := '(CONS)) 
(PAFFD := NIL))) 
(IF (EFFS-ANALYZE-IF NODE FM REDO)) 
(ASET 
(EFFS-ANALYZE (ASET\BODY FM) REDO) 
(LET ((ASETEFFS (IF (ASEAGLOBALP FM) 
'(SETQ) 
'(ASET)))) 
(ALTER-NODE NODE' 

(EFFS := (EFFS-UNION ASETEFFS 

(NODEXEFFS (ASEABODY FM)))) 
(AFFD := (NODEXAFFD (ASEABODY FM))) 
(PEFFS := (EFFS-UNION ASETEFFS 

(NODEXPEFFS (ASEABODY FM)))) 
(PAFFD := (NODEXPAFFD (ASEABODY FM)))))) 
(CATCH 
(EFFS-ANALYZE (CATCH\BODY FM) REDO) 
(ALTER-NODE NODE 
(EFFS 
(AFFD 
(PEFFS 
(PAFFD 
(LABELS 
(AMAPC (LAMBDA (F) (EFFS-ANALYZE F REDO)) 

(LABELSXFNDEFS FM)) 
(EFFS-ANALYZE (LABELS\BODY FM) REDO) 
(ALTER-NODE NODE 

(EFFS := (EFFS-UNION '(CONS) 

(NODEXEFFS (LABELSXBODY FM)))) 
(AFFD := (NODEXAFFD (LABELSXBODY FM))) 
(PEFFS := (EFFS-UNION '(CONS) 

(NODEXPEFFS (LABELSXBODY FM)))) 
(PAFFD := (NODEXPAFFD (LABELSXBODY FM))))) 
(COMBINATION 
(EFFS-ANALYZE-COMBINATION NODE FM REDO))))))) 



= (NODEXEFFS (CATCHXBODY FM))) 
= (NODEXAFFO (CATCHXBODY FM))) 
:= (NODEXPEFFS (CATCHXBODY FM))) 
:= (NODEXPAFFD (CATCHXBODY FM))))) 
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(While it may be held that allowing ASET' on variables is unclean, and 
that the use of cells as in PLASMA is semantically neater, it is true that 
because of the lexical scoping rules it can always be determined whether a given 
variable is ever used in an ASET'. In this way one can say that variables are 
divided by the compiler into two classes: those which are implicitly cells, and 
those which are not.) 

A closure (LAMBDA-expression) causes a CONS side-effect. This is not so 
much because SCHEME programs depend on being able to do EQ on closures (it is 
unclear whether this is a reasonable thing to specify in the semantics of 
SCHEME), as because there is no point in creating two closures when one will 
suffice. Adjoining CONS to EFFS will prevent the creation of such duplicate code 
by the optimizer. The same idea holds for LABELS (which has LAMBDA-expressions 
within it). 

Notice that a LAMBDA node does not add to its four sets the information 
from its body's sets. This is because evaluation of a LAMBDA-expression does not 
immediately evaluate the body. Only later, when the resulting closure is 
invoked, is the body executed. 

EFFS-UNION gives the union of two sets of side effects. It knows about 
the special symbols NONE and ANY. 

EFFS-ANALYZE-IF computes the side-effect sets for IF nodes. It has been 
made a separate function only because its code is so bulky; it must perform a 
three-way union for each of four sets. 

EFFS-ANALYZE-COMBINATION computes the side-effect sets for COMBINATION 
nodes. First the function is analyzed, then the arguments. The unions of the 
four sets over all the arguments are accumulated in EF, AF, PEF, and PAF. CHECK- 
COMBINATION-PEFFS is called to warn the user of any possible violations of the 
rule that SCHEME is privileged to choose the order in which to evaluate the 
subforms of a combination. Finally, there are three cases depending on the form 
of the function position. 

If it is a variable, then the property list of the variable name is 
searched for information about that function. (The generated names for local 
variables will never have any such information; thus information will be found 
only for global variables. This information is used to augment the sets. (A 
clever technique not used in RABBIT would be to arrange for situations like 
((LAMBDA (F) <bodyl>) (LAMBDA (...) <body2>), where F denotes a "known function" 
(see the description of BIND-ANALYZE below), to put on the property list of F the 
side-effect information for <body2>, to aid optimization in <bodyl>.) 

If the function position is a LAMBDA-expression, then the four sets of 
the body of the LAMBDA-expression are unioned into the four sets for the 
COMBINATION node. This is because in this case we know that the body LAMBDA- 
expression will be executed in the course of executing the COMBINATION node. 

In any other case, an unknown function is computed, and so it must be 
assumed that any side-effect is possible for EFFS and AFFD. 
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(DEFINE EFFS-UNION 

(LAMBDA (A B) 

(COND ((EO 
((EQ 
<(EQ 
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A 
B 
A 
((EQ B 



'NONE) B) 
■NONE) A) 
•ANY) 'ANY) 
•ANY) 'ANY) 



(T (UNION A B))))) 

(DEFINE EFFS-ANALYZE-IF 

(LAMBDA (NODE FM REDO) 

(BLOCK (EFFS-ANALYZE ( IF\PRED FM) REDO) 
(EFFS-ANALYZE (IFXCON FM) REDO) 
(EFFS-ANALYZE (IF\ALT FM) REDO) 
(ALTER-NODE NODE 
(EFFS 



= (EFFS-UNION (NODEXEFFS (IF\PRED FM)) 

(EFFS-UNION (NODE\EFFS (IFXCON FM)) 

(NODEXEFFS (IF\ALT FM))))) 
(AFFD := (EFFS-UNION (NODEXAFFD (IF\PRED FM)) 

(EFFS-UNION (NODEXAFFD ( IF\CON FM)) 

(NODEXAFFD (IFXALT FM))))) 
(PEFFS := (EFFS-UNION (NODEXPEFFS ( IF\PRED FM)) 

(EFFS-UNION (NODEXPEFFS (IFXCON FM)) 

(NODEXPEFFS ( IF\ALT FM))))) 
(PAFFD := (EFFS-UNION (NODEXPAFFD ( IFXPRED FM)) 

(EFFS-UNION (NODEXPAFFD (IFXCON FM)) 

(NODEXPAFFD (IF\ALT FM)) ))))))) 



(SET' *CHECK-PEFFS* NIL) 



(DEFINE EFFS-ANALYZE-COMBINATION 
(LAMBDA (NODE FM REDO) 

(LET ((ARCS (COMBINATIONXARGS FM))) 
(EFFS-ANALYZE (CAR ARGS) REDO) 
(DO ((A (CDR ARGS) (CDR A)) 

(EF 'NONE (EFFS-UNION EF (NODEXEFFS (CAR A)))) 
(AF 'NONE (EFFS-UNION AF (NODEXAFFD (CAR A)))) 
(PEF 'NONE (EFFS-UNION PEF (NODEXPEFFS (CAR A)))) 
(PAF 'NONE (EFFS-UNION PAF (NODEXPAFFD (CAR A))))) 
((NULL A) 
(IF *CHECK-PEFFS* (CHECK-COMBINATION-PEFFS FM)) 
(COND ((EO (TYPE (NODEXFORM (CAR ARGS))) 'VARIABLE) 

(LET ((V (VARIABLEXVAR (NODEXFORM (CAR ARGS))))) 
(LET ((VE (GET V 'FN-SIDE-EFFECTS) ) 
(VA (GET V 
(ALTER-NODE 



'FN-SIDE-AFFECTED))) 

NODE 

= (IF VE (EFFS-UNION EF VE ) 
= (IF VA (EFFS-UNION AF VA) 
:= (EFFS-UNION PEF VE ) ) 
:= (EFFS-UNION PAF VA)))))) 



•ANY)) 
•ANY)) 



(EFFS 
(AFFD 
(PEFFS 
(PAFFD 

((EQ (TYPE (NODEXFORM (CAR ARGS))) 'LAMBDA) 
(LET ((B (LAMBDAXBODY (NODEXFORM (CAR ARGS))))) 
(ALTER-NODE NODE 

(EFFS := (EFFS-UNION EF (NODEXEFFS B))) 
(AFFD := (EFFS-UNION AF (NODEXAFFD B))) 
(PEFFS := (EFFS-UNION PEF (NODEXPEFFS B))) 
(PAFFD := (EFFS-UNION PAF (NODEXPAFFD B)))))) 
(T (ALTER-NODE NODE 

(EFFS := 'ANY) 
(AFFD := 'ANY) 
(PEFFS := (EFFS-UNION PEF 

(NODEXPEFFS (CAR ARGS)))) 
(PAFFD := (EFFS-UNION PAF 

(NODEXPAFFD (CAR ARGS)))))))) 
(EFFS-ANALYZE (CAR A) REDO))))) 
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CHECK-CONBINATION-PEFFS checks all the argument forms of a combination 
(including the function position) to see if they are all independent of each 
other with respect to side effects. If not, a warning is issued. This is 
because the semantics of SCHEME specify that the arguments may be evaluated in 
any order, and the user may not depend on a particular ordering. 

The test is made by comparing all pairs of arguments within the 
combination. If the side-effects of one can "provably" affect the evaluation of 
the other, or if they both cause a side effect of the same category (other than 
CONS, which is special), then the results may depend on which order they are 
evaluated in. The test is not completely rigorous, and may err in either 
direction, but "probably" a reasonably written SCHEME program will satisfy the 
test. 

This check is controlled by the switch *CHECK-PEFFS* in EFFS-ANALYZE- 
COMBINATION. This switch is provided because empirical tests show that 
performing the test slows down compilation by twenty to thirty percent. The test 
has proved valuable in trapping programming errors, and so is normally on, but it 
can be turned off for speed in compiling programs in which one has confidence. 

EFFDEF is a macro which expands into a number of DEFPROP forms. This is 
used to define side-effect information about primitive functions. For example: 

(EFFDEF CADR NONE (RPLACA RPLACD)) 

states that CADR causes no side-effects, and is "provably" affected by the RPLACA 
and RPLACD categories of side-effects. Similarly: 

(EFFDEF MEMQ NONE (RPLACA RPLACD) T) 

specifies the same information for MEMQ, but the "T" means that a call to MEMQ 
with constant arguments may be "folded" (evaluated, and the result compiled 
instead), despite the fact that some side effects can affect it. This represents 
a judgement that it is extremely unlikely that someone will write a program which 
modifies a constant argument to be given to MEMQ. 
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002 (DEFINE CHECK-COMBINATION-PEFFS 

003 (LAMBDA (FM) 

004 (IF (NOT (COMBINATION\WARNP FM)) 

005 (DO ((A (COMBINATIONXARGS FM) (CDR A))) 

006 ((NULL A)) 

007 (DO ((B (CDR A) (CDR B))) 

008 ((NULL B)) 

009 (IF (NOT (EFFECTLESS (EFFS- INTERSECT (NODEXPEFFS (CAR A)) 

010 (NODEXPAFFD (CAR B))))) 
Oil (BLOCK (WARN |co-argument may affect later one | 

012 (NODEXSEXPR (CAR A)) 

013 "(EFFECTS = , (NODEXPEFFS (CAR A))) 

014 (NODEXSEXPR (CAR B)) 

015 "(AFFECTED BY , (NODEXPAFFD (CAR B)))) 

016 (ALTER-COMBINATION FM (WARNP := T)))) 

017 (IF (NOT (EFFECTLESS (EFFS- INTERSECT (NODEXPEFFS (CAR B)) 

018 (NODEXPAFFD (CAR A))))) 

019 (BLOCK (WARN |co-argument may affect earlier one| 

020 (NODEXSEXPR (CAR B)) 

021 "(EFFECTS = .(NODEXPEFFS (CAR B))) 

022 (NODEXSEXPR (CAR A)) 

023 "(AFFECTED BY .(NODEXPAFFD (CAR A)))) 

024 (ALTER-COMBINATION FM (WARNP := T)))) 

025 (IF (NOT (EFFECTLESS-EXCEPT-CONS (EFFS- INTERSECT (NODEXPEFFS (CAR A)) 

026 (NODEXPEFFS (CAR B))))) 

027 (BLOCK (WARN |co-arguments may have interfering effects | 

028 (NODEXSEXPR (CAR A)) 

029 "(EFFECTS = .(NODEXPEFFS (CAR A))) 

030 (NODEXSEXPR (CAR B)) 

031 "(EFFECTS = .(NODEXPEFFS (CAR B)))) 

032 (ALTER-COMBINATION FM (WARNP := T))))))))) 
033 

034 (DEFMAC EFFDEF (FN EFFS AFFD . FOLD) 

035 "(PROGN (DEFPROP .FN .EFFS FN-SIDE-EFFECTS) 

036 (DEFPROP .FN ,AFFD FN-SIDE-AFFECTED) 

037 .(AND FOLD "(DEFPROP .FN T OKAY-TO-FOLD)))) 
038 

039 (DECLARE (/3DEFINE EFFDEF |SIDE EFFECTS|)) 
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This page contains declarations of side-effect information for many 
standard primitive functions. The EFFDEF macro used to make the declarations is 
described on the previous page. 
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(PROGN 'COMPILE 

EFFDEF + NONE NONE) 

EFFDEF - NONE NONE) 

EFFDEF * NONE NONE) 

EFFDEF // NONE NONE) 

EFFDEF = NONE NONE) 

EFFDEF < NONE NONE) 

EFFDEF > NONE NONE) 

EFFDEF CAR NONE (RPLACA)) 

EFFDEF CDR NONE (RPLACD)) 

EFFDEF CAAR NONE (RPLACA)) 

EFFDEF CADR NONE (RPLACA RPLACD)) 

EFFDEF CDAR NONE (RPLACA RPLACD)) 

EFFDEF CDDR NONE (RPLACD)) 

EFFDEF CAAAR NONE (RPLACA)) 

EFFDEF CAADR NONE (RPLACA RPLACD)) 

EFFDEF CADAR NONE (RPLACA RPLACD)) 

EFFDEF CADDR NONE (RPLACA RPLACD)) 

EFFDEF CDAAR NONE (RPLACA RPLACD)) 

EFFDEF CDADR NONE (RPLACA RPLACD)) 

EFFDEF CDDAR NONE (RPLACA RPLACD)) 

EFFDEF CDDDR NONE (RPLACD)) 
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(RPLACA)) 
(RPLACA RPLACD) 
(RPLACA RPLACD) 
(RPLACA RPLACO) 
(RPLACA RPLACD) 
(RPLACA RPLACD) 
(RPLACA RPLACD) 
(RPLACA RPLACD) 
(RPLACA RPLACD) 
(RPLACA RPLACD) 
(RPLACA RPLACD) 
(RPLACA RPLACD) 
(RPLACA RPLACD) 
(RPLACA RPLACD) 
(RPLACA RPLACD) 
(RPLACD)) 



EFFDEF CAAAAR NONE 
EFFDEF CAAADR NONE 
EFFDEF CAADAR NONE 
EFFDEF CAADDR NONE 
EFFDEF CADAAR NONE 
EFFDEF CADADR NONE 
EFFDEF CADDAR NONE 
EFFDEF CADDDR NONE 
EFFDEF CDAAAR NONE 
EFFDEF CDAADR NONE 
EFFDEF CDADAR NONE 
EFFDEF CDADDR NONE 
EFFDEF CDDAAR NONE 
EFFDEF CDDADR NONE 
EFFDEF CDDDAR NONE 
EFFDEF CDDDDR NONE 
EFFDEF CXR NONE (RPLACA RPLACD)) 
EFFDEF RPLACA (RPLACA) NONE) 
EFFDEF RPLACD (RPLACA) NONE) 
EFFDEF RPLACX (RPLACA RPLACD) NONE) 
EFFDEF EO NONE NONE) 
EFFDEF ATOM NONE NONE) 
EFFDEF NUMBERP NONE NONE) 
EFFDEF TYPEP NONE NONE) 
EFFDEF SYMBOLP NONE NONE) 
EFFDEF HUNKP NONE NONE) 
EFFDEF FIXP NONE NONE) 
EFFDEF FLOATP NONE NONE) 
EFFDEF BIGP NONE NONE) 
EFFDEF NOT NONE NONE) 
EFFDEF NULL NONE NONE) 
EFFDEF CONS (CONS) NONE) 
EFFDEF LIST (CONS) NONE) 
EFFDEF APPEND (CONS) (RPLACD)) 
EFFDEF MEMO NONE (RPLACA RPLACD) T) 
EFFDEF ASSQ NONE (RPLACA RPLACD) T) 
EFFDEF PRINT (FILE) (FILE RPLACA RPLACD)) 
EFFOEF PRIN1 (FILE) (FILE RPLACA RPLACO)) 
EFFDEF PRINC (FILE) (FILE RPLACA RPLACD)) 
EFFDEF TERPRI (FILE) (FILE)) 
EFFDEF TYO (FILE) (FILE)) 
EFFDEF READ ANY (FILE)) 
EFFDEF TYI ANY (FILE)) 
'SIDE -EFFECTS-PROPERTIES) 
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ERASE-NODE and ERASE-ALL-NODES are convenient mnemonic macros used to 
invoke ERASE-NODES. 

ERASE-NODES is used by the optimizer to destroy nodes which have been 
removed from the program tree because of some optimization. If ALLP is NIL 
(ERASE-NODE), then only the given node is erased; if it is T (ERASE-ALL-NODES), 
then the given node and all descendants, direct and indirect, are erased. 

Erasing a node may involve removing certain properties from property 
lists. This is necessary to maintain the consistency of the properties. For 
example, if a VARIABLE node is erased, then that node must be removed from the 
READ-REFS property of the variable name. The optimizer depends on this so that, 
for example, it can determine whether all references to a variable have been 
erased. 

It should be noted in passing that in principle all occurrences of ASET 
on a given variable could be erased, thereby reducing its WRITE-REFS property to 
NIL. Because the EFFS-ANALYZE computation on VARIABLE nodes used the WRITE-REFS 
property, a VARIABLE node might have ASET in its AFFD set after the optimizer had 
removed all the ASET nodes. Because of the tree-walking discipline of the 
optimizer, the VARIABLE nodes will not be reanalyzed immediately. This cannot 
hurt, however; it may just cause the optimizer later to be more cautious than 
necessary when examining a VARIABLE node. (If this doesn't make sense, come back 
after reading the description of the optimizer.) 

The flag *TESTING* is used to determine whether or not to remove the node 
from the NODE property on the node's name. When debugging, it is very useful to 
keep this information around to trace the optimizer's actions; but when 
compiling a large function for "production" purposes, the discarded nodes may 
bloat memory, and so they must be removed from the NODE property in order that 
they may be garbage-collected by LISP. 
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002 ;;; THIS ROUTINE IS USED TO UNDO ANY PASS 1 ANALYSIS ON A NODE. 
003 

004 (DEFMAC ERASE-NODE (NOOE) "(ERASE-NODES .NODE NIL)) 

005 (DEFMAC ERASE-ALL-NODES (NODE) "(ERASE-NODES .NODE T)) 
006 

007 (DEFINE ERASE-NODES 

008 (LAMBDA (NODE ALLP) 

009 (LET ((FM (NODE\FORM NODE))) 

010 (OR (EQ (TYPE NODE) 'NODE) 

OH (ERROR ' |Cannot erase a non-node) NODE 'FAIL-ACT)) 

012 (EQCASE (TYPE FM) 

013 (CONSTANT) 

014 (VARIABLE 

015 (DELPROP (VARIABLEWAR FM) NODE 'READ-REFS)) 

016 (LAMBDA 

017 (IF ALLP (ERASE-ALL-NODES (LAMBDA\BODY FM))) 

018 (IF (NOT *TESTING*) 

019 (AMAPC (LAMBDA (V) (REMPROP V 'BINDING)) (LAMBDAWARS FM)))) 

020 (IF (COND (ALLP (ERASE-ALL-NODES ( IF\PRED FM)) 

021 (ERASE-ALL-NODES (IF\CON FM)) 

022 (ERASE-ALL-NODES ( IF\ALT FM))))) 

023 (ASET 

024 (IF ALLP (ERASE-ALL-NODES (ASET\BODY FM))) 

2 5 (DELPROP (ASETWAR FM) NODE 'WRITE-REFS) ) 

026 (CATCH 

2 ? (IF ALLP (ERASE-ALL-NODES (CATCH\BODY FM))) 

028 (IF (NOT *TESTING*) 

29 (REMPROP (CATCHWAR FM) 'BINDING))) 

030 (LABELS 

031 (COND (ALLP i AMAPC (LAMBDA (D) (ERASE-ALL-NODES D)) 

032 (LABELS\FNDEFS FM)) 

033 (ERASE-ALL-NODES (LABELS\BODY FM)))) 

034 (IF (NOT *TESTING*) 

035 (AMAPC (LAMBDA (V) (REMPROP V 'BINDING)) (LABELS\FNVARS FM)))) 

036 (COMBINATION 

037 (IF ALLP (AMAPC (LAMBDA (A) (ERASE-ALL-NODES A)) 

03 8 (COMBINATIONNARGS FM))))) 

039 (IF (NOT *TESTING*) 

040 (REMPROP (NODE\NAME NODE) 'NODE))))) 
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META-EVALUATE is the top-level function of the optimizer. It accepts a 
node, and returns a node (not necessarily the same one) for an equivalent 
program. 

The METAP flags in the nodes are used to control re-analysis. META- 
EVALUATE checks this flag first thing, and returns the given node immediately if 
its METAP flag is non-NIL, meaning the node has already been properly optimized. 
Otherwise it examines the node more carefully. 

Some rules about the organization of the optimizer: 
[1] A node returned by a call to META-EVALUATE will always have its METAP flag 
set. 

[2] The descendants of a node must be meta-evaluated before any information in 
them is used. 

[3] If a node has its METAP flag set, so do all of its descendants. Moreover, 
REANALYZE1 has been applied to the node, so all of the information filled in by 
pass-1 analysis (ENV-ANALYZE, TRIV-ANALYZE, and EFFS-ANALYZE) is up-to-date. 

When COMPILE calls META-EVALUATE, all the METAP flags are NIL, and no 
pass-1 analysis has been performed. META-EVALUATE, roughly speaking, calls 
itself recursively, and meta-evaluates the node tree from the bottom up. After 
meta-evaluating all the descendants of a node, it applies REANALYZE1 to perform 
pass-1 analysis on that node, sets the METAP flag, and returns the node. 
Exceptions can be made to this discipline if a non-trivial optimization occurs. 

If the (meta-evaluated) predicate part of an IF node is itself an IF node 
(and the debugging switch *FUDGE* is set), then META-IF-FUDGE is called. If it 
is a constant, then the value of the constant is used to select either the 
consequent CON or the alternative ALT. The other one is then erased, and the IF 
node is itself erased. The selected component node is then returned (it has 
already been meta-evaluated). The statistics counter *DEAD-COUNT* counts 
occurrences of this "dead code elimination" optimization. 

The other two interesting cases are COMBINATION nodes whose function 
position contains either a trivial function or a LAMBDA node. META-COMBINATION- 
TRIVFN and META-COMBINATION-LAMBDA handle these respective cases. 
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002 ;;; THE VALUE OF META-EVALUATE IS THE (POSSIBLY NEW) NODE RESULTING FROM THE GIVEN ONE 
003 

004 (SET' *FUDGE* T) ;SWITCH TO CONTROL META- IF-FUDGE 

005 (SET' *DEAD-COUNT* 0) ;COUNT OF DEA0-CO0E ELIMINATIONS 
006 

007 (DEFINE META-EVALUATE 

008 (LAMBDA (NODE) 

009 (IF (NODEXMETAP NODE) 

010 NODE 

Oil (LET ((FM (NODEXFORM NODE))) 

012 (EQCASE (TYPE FM) 

013 (CONSTANT 

014 ( REANALYZE 1 NODE) 

015 (ALTER-NOOE NODE (METAP := T))) 

016 (VARIABLE 

017 (RE ANALYZE 1 NODE) 

018 (ALTER-NODE NODE (METAP := T))) 

019 (LAMBDA 

020 (ALTER-LAMBDA FM (BODY := (META-EVALUATE (LAMBDA\BODY FM)))) 
°21 ( REANALYZE 1 NODE) 

022 (ALTER-NODE NODE (METAP := T))) 

023 (IF 

024 (ALTER-IF FM 

025 (PREO := (META-EVALUATE (IF\PRED FM))) 

026 (CON := (META-EVALUATE (IF\CON FM))) 

027 (ALT := (META-EVALUATE (IF\ALT FM)))) 

028 (IF (AND *FUDGE* (EQ (TYPE (NODE\FORM (IF\PRED FM))) 'IF)) 

029 (META-IF-FUDGE NODE) 

030 (IF (EQ (TYPE (NODEXFORM (IF\PRED FM))) 'CONSTANT) 

031 (LET ((CON (IFXCON FM)) 

032 (ALT (IF\ALT FM)) 

033 (VAL (CONSTANTWALUE (NODE\FORM ( IF\PRED FM))))) 

034 (ERASE-NODE NODE) 

035 (ERASE-ALL-NODES (IF\PRED FM)) 

036 (INCREMENT *DEAD-COUNT* ) 

037 (IF VAL 

038 (BLOCK (ERASE-ALL-NODES ALT) CON) 

039 (BLOCK (ERASE-ALL-NODES CON) ALT))) 

040 (BLOCK (REANALYZE1 NODE) 

041 (ALTER-NODE NODE (METAP := T)))))) 

042 (ASET 

043 (ALTER-ASET FM (BODY := (META-EVALUATE (ASET\BODY FM)))) 

044 (REANALYZE1 NODE) 

04 5 (ALTER-NODE NODE (METAP := T))) 

046 (CATCH 

047 (ALTER-CATCH FM (BODY := (META-EVALUATE (CATCH\BODY FM)))) 

04 8 (REANALYZE1 NODE) 

049 (ALTER-NODE NODE (METAP := T))) 

050 (LABELS 

051 (DO ((D (LABELSXFNDEFS FM) (CDR D))) 

052 ((NULL D)) 

053 (RPLACA D (META-EVALUATE (CAR D)))) 

054 (ALTER-LABELS FM (BODY := (META-EVALUATE (LABELSXBODY FM)))) 
°55 (REANALYZE1 NODE) 

°56 (ALTER-NODE NODE (METAP := T))) 

057 (COMBINATION 

058 (LET ((FN (NODEXFORM (CAR (COMBINATIONXARGS FM))))) 

059 (COND ((AND (EQ (TYPE FN) 'VARIABLE) 

060 (TRIVFN (VARIABLEXVAR FN))) 

061 (META-COMBINATION-TRIVFN NODE)) 

062 ((EQ (TYPE FN) 'LAMBDA) 

063 (META-COMBINATION-LAMBDA NODE)) 

064 (T (DO ((A (COMBINATIONXARGS FM) (CDR A))) 

065 ((NULL A)) 

066 (RPLACA A (META-EVALUATE (CAR A)))) 

067 ( REANALYZE 1 NODE) 

068 (ALTER-NODE NODE (METAP := T))))))))))) 
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For an IF nested within another IF, the transformation shown in the 
comment is performed. This involves constructing an S-expression of the 
appropriate form and then calling ALPHATIZE to convert it into a node-tree. (The 
node-tree could be constructed directly, but it is easier to call ALPHATIZE. 
This is the reason why ALPHATIZE merely returns a NODE if it encounters one in 
the S-expression; META-IF-FUDGE inserts various nodes in the S-expression it 
constructs.) The original two IF nodes are erased, a statistics counter •FUDGE- 
COUNT* is incremented, and the new expression is meta-evaluated and returned in 
place of the nested IF nodes. 

(The statistics counter shows that this optimization is performed with 
modest frequency, arising from cases such as (IF (AND ...)...).) 

NETA-COMBINATION-TRIVFN performs the standard recursive meta-evaluation 
of all the arguments, and then checks to see whether the combination can be 
"folded". This is possible all the arguments are constants, and if the function 
has no side effects and cannot be affected by side-effects, or has an OKAY-TO- 
FOLD property. If this is the case, the function is applied to the arguments, 
the combination node and its descendants are erased, the statistics counter 
*FOLD-COUNT* is bumped, and a new CONSTANT node containing the result is created 
and meta-evaluated. This might typically occur for (NOT NIL) => T, or (+ 3 4) => 
7, or (MEMQ 'BAR '(FOO BAR BAZ)) => '(BAR BAZ). If this optimization is not 
permissible, then the usual reanalysis and setting of the METAP flag is 
performed. 

(The statistics counter shows that even in a very large program such as 
RABBIT this optimization is performed fewer than a dozen times. This may be due 
to my programming style, or because there are very few macros in the code for 
RABBIT which might expand into foldable code.) 
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002 ;;; TRANSFORM ( IF ( IF A B C) D E ) INTO: 

003 ; ; ; ((LAMBDA (Dl El) 

004 -: (IF A (IF B (Dl) (El)) (IF C (Dl) (El)))) 

005 ; ; ; (LAMBDA ( ) D) 

006 ;;; (LAMBDA ( ) E)) 
007 

008 (SET 1 *FUDGE-COUNT* 0) ;COUNT OF IF-FUDGES 

009 

010 (DEFINE META-IF-FUDGE 

Oil (LAMBDA (NODE) 

012 (LET ((FM (NODEXFORM NOOE ) ) ) 

01 3 (LET ((PFM (NODEXFORM (IF\PRED FM)))) 

014 (LET ((N (ALPHATIZE (LET ((CONVAR (GENTEMP 'META-CON)) 

015 (ALTVAR (GENTEMP 'META-ALT))) 

016 "((LAMBDA (.CONVAR .ALTVAR) 

017 (IF ,(IF\PRED PFM) 

018 (IF ,(IF\CON PFM) 

019 (.CONVAR) 

020 (.ALTVAR)) 

021 (IF ,(IF\ALT PFM) 

022 (.CONVAR) 

023 (.ALTVAR)))) 

024 (LAMBDA () ,(IF\CON FM)) 

025 (LAMBDA () ,(IF\ALT FM)))) 

026 (NODEXENV NODE)))) ;DOESN'T MATTER 

027 (ERASE-NODE NODE) 

028 (ERASE-NODE ( IF\PRED FM)) 
°29 (INCREMENT *FU0GE-COUNT*) 
030 (META-EVALUATE N)))))) 
031 

032 ;;; REDUCE A COMBINATION WITH A SIDE-EFFECT-LESS TRIVIAL 

033 ;;; FUNCTION AND CONSTANT ARGUMENTS TO A CONSTANT. 
034 

035 (SET' *FOLD-COUNT* 0) : COUNT OF CONSTANT FOLDINGS 

036 

037 (DEFINE META-COMBINATION-TRIVFN 

038 (LAMBDA (NODE) 

039 (LET ((FM (NODEXFORM NODE))) 

040 (LET ((ARGS (COMBINATION\ARGS FM))) 

041 (RPLACA ARGS (META-EVALUATE (CAR ARGS))) 

042 (DO ((A (CDR ARGS) (CDR A)) 

043 (CONSTP (LET ((FNNAME (VARIABLEWAR (NODEXFORM (CAR ARGS))))) 

044 (OR (AND (EO (GET FNNAME 

045 'FN-SIDE-EFFECTS) 

046 'NONE) 

047 (EO (GET FNNAME 

048 'FN-SIDE-AFFECTED) 

049 'NONE)) 

050 (GET FNNAME 'OKAY-TO-FOLD))) 

051 (AND CONSTP (EO (TYPE (NODEXFORM (CAR A))) 'CONSTANT)))) 

052 ((NULL A) 

053 (COND (CONSTP 

054 (LET ((VAL (APPLY (VARIABLEWAR (NODEXFORM (CAR ARGS))) 

055 (AMAPCAR (LAMBDA (X) 

JJ 58 (CONSTANTXVALUE 

° n l 7 a (NODEXFORM X))) 

058 (CDR ARGS))))) 

059 (ERASE-ALL-NODES NODE) 

060 (INCREMENT *FOLD-COUNT* ) 

061 (META-EVALUATE (ALPHATIZE "(QUOTE .VAL) NIL)))) 

062 (T (REANALYZE1 NODE) 

063 (ALTER-NODE NODE (METAP := T))))) 

064 (RPLACA A (META-EVALUATE (CAR A)))))))) 
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META-COMBINATION-LAMBDA performs several interesting optimizations on 
combinations of the form ((LAMBDA ...)...). It is controlled by several 
debugging switches, and keeps several statistics counters, which we will not 
describe further. 

First all the arguments, but not the LAMBDA-expression, are meta- 
evaluated by the first DO loop. Next, the body of the LAMBDA node is meta- 
evaluated and kept in the variable B in the second DO loop. This loop iterates 
over the LAMBDA variables and the corresponding arguments. For each variable- 
argument pair, SUBST-CANDIDATE determines whether the argument can "probably" be 
legally substituted for occurrences of the variable in the body. If so, META- 
SUBSTITUTE is called to attempt such substitution. When the loop finishes, B has 
the body with all possible substitutions performed. This body is then re-meta- 
evaluated. (The reason for this is explained later in the discussion of META- 
SUBSTITUTE.) 

Next an attempt is made to eliminate LAMBDA variables. A variable and 
its corresponding argument may be eliminated if the variable has no remaining 
references, and the argument either has no side effects or has been successfully 
substituted. (If an argument has side effects, then SUBST-CANDIDATE will give 
permission to attempt substitution only if no more than one reference to the 
corresponding variable exists. If the substitution fails, then the argument may 
not be eliminated, because its side effects must not be lost. It the 
substitution succeeds, then the argument must be eliminated, because the side 
effects must not be duplicated.) A consistency check ensures that in fact the 
variable is unreferenced within the body as determined by its REFS and ASETS 
slots; then the argument and variable are deleted, and the nodes of the argument 
are erased. 

When all possible variable-argument pairs have been eliminated, then 
there are two cases. If the LAMBDA has no variables left, then the combination 
containing it can be replaced by the body of the LAMBDA node. In this case the 
LAMBDA and COMBINATION nodes are erased. Otherwise the LAMBDA and COMBINATION 
nodes are reanalyzed and their METAP flags are set. 

(The statistics counters show that when RABBIT compiles itself these 
three optimizations are performed hundreds of times. This occurs because many 
standard macros make use of closures to ensure that variables local to the code 
for the macro do not conflict with user variables. These closures often can be 
substituted into the code by the compiler and eliminated.) 



001 RABBIT 568 05/15/78 Page 21 

002 (SET' *FLUSH-ARGS* T) jSWlTCH TO CONTROL VARIABLE ELIMINATION 

003 (SET- *FLUSH-COUNT* 0) ;COUNT OF VARIABLES ELIMINATED 

004 (SET- *CONVERT-COUNT* 0) ;CQUNT OF FULL BETA-CONVERSIONS 
005 

006 (OEFINE 

007 META-COMBINATION-LAMBDA 

008 (LAMBDA (NODE) 

009 (LET ((FM (NODE\FORM NODE))) 

010 (LET ((ARGS (COMBINATION\ARGS FM))) 
OH (DO ((A (CDR ARGS) (CDR A))) 

012 ((NULL A)) 

013 (RPLACA A (META-EVALUATE (CAR A))) 

014 (ALTER-NODE (CAR A) (SUBSTP := NIL))) 

015 (LET ((FN (NOOEXFORM (CAR ARGS)))) 

016 (DO ((V (LAMBDAWARS FN) (CDR V)) 

017 (A (CDR ARGS) (CDR A)) 

01 Q (B (META-EVALUATE (LAMBDA\BODY FN)) 

019 (IF (SUBST-CANDIDATE (CAR A) (CAR V) B) 

020 (META-SUBSTITUTE (CAR A) (CAR V) B) 

021 B))) 

022 ((NULL V) 

023 (ALTER-LAMBDA FN (BODY := (META-EVALUATE B))) 

024 (DO ((V (LAMBDAWARS FN) (CDR V)) 

025 (A (CDR ARGS) (CDR A))) 

2 6 ((NULL A)) 

027 (IF (AND *FLUSH-ARGS* 

028 (NULL (GET (CAR V) 'READ-REFS)) 

029 (NULL (GET (CAR V) ' WRITE-REFS) ) 

030 (OR (EFFECTLESS-EXCEPT-CONS (NODE\EFFS (CAR A))) 

031 (NODEXSUBSTP (CAR A)))) 

032 (BLOCK (IF (OR (MEMO V (NODEXREFS (LAMBDAXBODY FN))) 

033 (MEMO V (NODENASETS (LAMBDAXBODY FN)))) 

034 (ERROR '|Reanalysis lost - META-COMBINATION-LAMBDA | 

035 NODE 

036 'FAIL-ACT)) 

037 (DELO (CAR A) ARGS) 

038 (ERASE-ALL-NODES (CAR A)) 

039 (INCREMENT *FLUSH-COUNT* ) 

040 (ALTER-LAMBDA FN 

041 (VARS := (DELO (CAR V) (LAMBDAWARS FN))) 

042 (UVARS := (DELO (GET (CAR V) 'USER-NAME) 

043 (LAMBDAXUVARS FN))))))) 

044 (COND ((NULL (LAMBDAWARS FN)) 

04 5 (OR (NULL (CDR ARGS)) 

046 (ERROR '|Too many args in META-COMBINATION-LAMBDA | 

047 NODE 

048 'FAIL-ACT)) 

049 (LET ((BOD (LAMBDAXBODY FN))) 

050 (ERASE-NODE (CAR ARGS)) 

05 1 (ERASE-NODE NODE) 

052 (INCREMENT *CONVERT-COUNT* ) 

053 BOD)) 

054 (T ( REANALYZE 1 (CAR ARGS)) 

055 (ALTER-NODE (CAR ARGS) (METAP := T)) 

056 (REANALYZE1 NODE) 

057 (ALTER-NODE NODE (METAP := T))))))))))) 
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(SUBST-CANDIDATE ARG VAR BOD) is a predicate which is true iff it is 
apparently legal to attempt to substitute the argument ARG for the variable VAR 
in the body BOD. This predicate is very conservative , because there is no 
provision for backing out of a bad choice. The decision is made on this basis: 
[1] There must be no ASET references to the variable. (This is overly 
restrictive, but is complicated to check for correctly, and makes little 
difference in practice.) 
[2] One of three conditions must hold: 

[2a] There is at most one reference to the variable. (Code with possible 

side effects must not be duplicated. Exceptions occur, for example, if there 

are two references, one in each branch of an IF, so that only one can be 

executed. This is hard to detect, and relaxing this restriction is probably 

not worthwhile. ) 

[2b] The argument is a constant or variable. (This is always safe because 

the cost of a constant or variable is no worse than the cost of referencing 

the variable it replaces.) 

[2c] The argument is a LAMBDA-expression, and either: 

[2cl] There is no more than one reference. (This is tested again because 
of the presence of debugging switches in SUBST-CANDIDATE which can control 
various tests independently to help localize bugs.) 

[2c2] The body of the LAMBDA-expression is a combination, all of whose 
descendants are constants or variables, and the number of arguments of the 
combination (not counting the function) does not exceed the number of 
arguments taken by the LAMBDA-expression. (The idea here is that 
substitution of the LAMBDA-expression into function position of some 
combination will later allow reduction to a combination which is no worse 
than the original one. This test is a poor heuristic if references to the 
variable VAR occur in other than function position within BOD, because then 
several closures will be made instead of one, but is very good for code 
typically produced by the expansion of macros. In retrospect, perhaps ENV- 
ANALYZE should maintain a third property besides READ-REFS and WRITE-REFS 
called, say, NON-FN-REFS. This would be the subset of READ-REFS which 
occur in other than function position of a combination. SUBST-CANDIDATE 
could then use this information. Alternatively, META-SUBSTITUTE could, as 
it walked the node-tree of the body, keep track of whether a variable was 
encountered in function position, and refuse to substitute a LAMBDA- 
-expression for a variable not in such a position which had more than one 
reference. This might in turn prevent other optimizations, however.) 
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(SET 1 *SUBSTITUTE* T) 
(SET* *SINGLE-SUBST* T) 
(SET 1 *LAMBDA-SUBST* T) 
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SWITCH TO CONTROL SUBSTITUTION 

SWITCH TO CONTROL SUBSTITUTION OF EXPRESSIONS WITH SIDE EFFECTS 
SWITCH TO CONTROL SUBSTITUTION OF LAMBDA-EXPRESSIONS 



;BE PARANOID FOR NOW 



(DEFINE SUBST-CANDIDATE 

(LAMBDA (AR6 VAR BOD) 

(AND *SUBSTITUTE* 

(NOT (GET VAR • WRITE-REFS ) ) 
(OR (AND *SINGLE-SUBST* 

(NULL (CDR (GET VAR 'READ-REFS) ) ) ) 
(MEMO (TYPE (NODE\FORM ARG)) '(CONSTANT VARIABLE)) 
(AND *LAMBDA-SUBST* 

(EQ (TYPE (NODENFORM ARG)) 'LAMBDA) 
(OR (NULL (CDR (GET VAR 'READ-REFS))) 

(LET ((B (NODE\FORM (LAMBDA\BODY (NODENFORM ARG))))) 
(OR (MEMO (TYPE B) '(CONSTANT VARIABLE)) 
(AND (EQ (TYPE B) 'COMBINATION) 

(NOT (> (LENGTH (CDR (COMBINATION\ARGS B))) 

(LENGTH (LAMBDAWARS (NODENFORM ARG))))) 
(DO ((A (COMBINATIONNARGS B) (CDR A)) 

(P T (AND P (MEMO (TYPE (NODENFORM (CAR A))) 
'(CONSTANT VARIABLE))))) 
((NULL A) P))))))))))) 
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REANALYZE 1 calls PASS1 -ANALYZE on the given node. The argument T means 
that optimization is in effect, and so EFFS-ANALYZE must be invoked after ENV- 
ANALYZE and TRIV-ANALYZE (EFFS-ANALYZE information is used only by the 
optimizer). The argument *REANALYZE* specifies whether reanalysis should be 
forced to all descendant nodes, or whether reanalysis of the current node will 
suffice. This variable normally contains the symbol ONCE, meaning reanalyze only 
the current node. NETA-EVALUATE normally ensures, before analyzing a node, that 
all descendant nodes are analyzed. Thus the initial pass-1 analysis occurs 
incrementally, interleaved with the meta-evaluation process. 

The switch *REANALYZE* may be set to the symbol ALL to force all 
descendants of a node to be reanalyzed before analyzing the node itself. This 
ability is provided to test for certain bugs in the optimizer. If the 
incremental analysis should fail for some reason, then the descendant nodes may 
not contain correct information (for example, their information slots may be 
empty!). The ALL setting ensures that a consistent analysis is obtained. If the 
optimizer's behavior differs depending on whether *REANALYZE* contains ONCE or 
ALL, then a problem with the incremental analysis is implicated. This switch has 
been very useful for isolating such bugs. 

The next group of functions are utilities for META-SUBSTITUTE which deal 
with sets of side-effects. 

EFFS- INTERSECT takes the intersection of two sets of side-effects. It is 
just like INTERSECT, except that it also knows about the two special sets ANY and 
NONE. 

EFFECTLESS is a predicate which is true of an empty set of side-effects. 

EFFECTLESS-EXCEPT-CONS is a predicate true of a set of side-effects which 
is empty except possibly for the CONS side-effect. 

PASSABLE takes a node and two sets of side-effects, which should be the 
EFFS and AFFD sets from some other node. PASSABLE is a predicate which is true 
if the given node, which originally preceded the second in the standard 
evaluation order, can legitimately be postponed until after the second is 
evaluated. That is, it is true iff the first node can "pass" the second during 
the substitution process. 
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(DEFINE REANALYZE1 

(LAMBDA (NODE) 

(PASS1-ANALYZE NODE *REANALYZE* T))) 

(SET' *REANALYZE* 'ONCE) 



HERE WE DETERMINE, FOR EACH VARIABLE NODE WHOSE VAR IS THE ONE 
GIVEN, WHETHER IT IS POSSIBLE TO SUBSTITUTE IN FOR IT; THIS IS 
DETERMINED ON THE BASIS OF SIDE EFFECTS. THIS IS DONE BY 
WALKING THE PROGRAM, STOPPING WHEN A SIDE-EFFECT BLOCKS IT. 
A SUBSTITUTION IS MADE IFF IS VARIABLE NODE IS REACHED IN THE WALK. 

THERE IS A BUG IN THIS THEORY TO THE EFFECT THAT A CATCH 
WHICH RETURNS MULTIPLY CAN CAUSE AN EXPRESSION EXTERNAL 
TO THE CATCH TO BE EVALUATED TWICE. THIS IS A DYNAMIC PROBLEM 
WHICH CANNOT BE RESOLVED AT COMPILE TIME, AND SO WE SHALL 
IGNORE IT FOR NOW. 

WE ALSO RESET THE METAP FLAG ON ALL NODES WHICH HAVE A 
SUBSTITUTION AT OR BELOW THEM, SO THAT THE META-EVALUATOR WILL 
RE-PENETRATE TO SUBSTITUTION POINTS, WHICH MAY ADMIT FURTHER 
OPTIMIZATIONS. 



(DEFINE EFFS-INTERSECT 
(LAMBDA (A B) 

(COND ((EQ A 'ANY) B) 

((EQ B 'ANY) A) 

((EQ A 'NONE) A) 

((EQ B 'NONE) B) 
(T (INTERSECT A B))))) 

(DEFINE EFFECTLESS 

(LAMBDA (X) (OR (NULL X) (EQ X 'NONE)))) 

(DEFINE EFFECTLESS-EXCEPT-CONS 

(LAMBDA (X) (OR (EFFECTLESS X) (EQUAL X '(CONS))))) 

(DEFINE PASSABLE 

(LAMBDA (NODE EFFS AFFD) 

(BLOCK (IF (EMPTY (NODE\EFFS NODE)) 

(ERROR '|Pass 1 Analysis Missing - PASSABLEI 
NODE 

•FAIL-ACT)) 
(AND (EFFECTLESS (EFFS-INTERSECT EFFS (NODE\AFFD NODE))) 
(EFFECTLESS (EFFS-INTERSECT AFFD (NODE\EFFS NODE))) 
(EFFECTLESS-EXCEPT-CONS (EFFS-INTERSECT EFFS (NODE\EFFS NODE))))))) 
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META-SUBSTITUTE takes a node-tree ARG, a variable name VAR, and another 
node-tree BOD, and wherever possible substitutes copies of ARG for occurrences of 
VAR within BOD. The complexity of this process is due almost entirely to the 
necessity of determining the extent of "wherever possible". 

META-SUBSTITUTE merely spreads out the EFFS and AFFD slots of ARG to make 
them easy to refer to, makes an error check, and then passes the buck to the 
internal LABELS routine SUBSTITUTE, which does the real work. 

SUBSTITUTE recurs over the structure of the node-tree. At each node it 
first checks to see whether VAR is in the REFS set of that node. This is purely 
an efficiency hack: if VAR is not in the set, then it cannot occur anywhere 
below that node in the tree, and so SUBSTITUTE can save itself the work of a 
complete recursive search of that portion of the node-tree. 

SUBSTITUTE plays another efficiency trick in cahoots with META- EVALUATE 
to save work. Whenever SUBSTITUTE actually replaces an occurrence of VAR with a 
copy of ARG, the copy of ARG will have its METAP flag turned off (set to NIL). 
Now SUBSTITUTE propagates the METAP flag back up the node-tree; when all sub- 
nodes of a node have had SUBSTITUTE applied to them, then if the METAP flag of 
the current node is still set, it is set to the AND of the flags of the subnodes. 
Thus any node below which a substitution has occurred will have its METAP flag 
reset. More to the point, any node which after the substitution still has its 
METAP flag set has had no substitutions occur below it. META-EVALUATE can then 
be applied to BOD after all substitutions have been tried (this occurs in META- 
COMBINATION-LAMBDA), and META-EVALUATE will only have to re-examine those parts 
of BOD which have changed. In particular, if no substitutions were successful, 
META-EVALUATE will not have to re-examine BOD at all. 

If the variable is referenced at or below the node, it breaks down into 
cases according to the type of the node. 

For a CONSTANT, no action is necessary. 

For a VARIABLE, no action is taken unless the variable matches VAR, in 
which case the node is erased and a copy of ARG is made and returned in its 
place. The SUBSTP slot of the original ARG is set as a flag to META-COMBINATION- 
LAMBDA (q.v.), to let it know that at least one substitution succeeded. 

For a LAMBDA, substitution can occur in the body only if ARG has no side- 
effects except possibly CONS. This is because evaluation of the LAMBDA- 
expression (to produce a closure) will not necessarily cause evaluation of the 
side-effect in ARG at the correct time. The special case of a LAMBDA occurring 
as the function in a COMBINATION is handled separately below. 

For an IF, substitution is attempted in the predicate. It is attempted 
in the other two sub-trees only if ARG can pass the predicate. 

For an ASET' or a CATCH, substitution is attempted in the body. The same 
is true of LABELS, but substitution is also attempted in the labelled function 
definitions. 
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(SET' *SUBST-COUNT* 0) 

(SET' *LAMBDA-BODY-SUBST* T) 

(SET' *LAMBDA-BODY-SUBST-TRY-COUNT* 0) 

(SET' *LAMBDA-BODY-SUBST-SUCCESS-COUNT* 0) 
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;COUNT OF SUBSTlfufToNS 

;SWITCH TO CONTROL SUBSTITUTION IN LAMBDA BODIES 
;COUNT THEREOF - TRIES 
;COUNT THEREOF - SUCCESSES 



(DEFINE 
META-SUBSTITUTE 
(LAMBDA 
(ARG VAR BOD) 

(LET ((EFFS (NODEXEFFS ARG)) 
(AFFD (NODEXAFFD ARG))) 
(IF (EMPTY EFFS) 

(ERROR '|Pass 1 Analysis Screwed Up - META-SUBSTITUTE I ARG 'FAIL-ACT)) 
(LABELS " 

((SUBSTITUTE 
(LAMBDA (NODE) 

(IF (OR (EMPTY (NODEXREFS NODE)) 

(NOT (MEMO VAR (NODE\REFS NODE)))) EFFICIENCY HACK 

NODE 

(LET ((FM (NODEXFORM NODE))) 
(EQCASE (TYPE FM) 

(CONSTANT NODE) 
(VARIABLE 
(IF (EQ (VARIABLEXVAR FM) VAR) 
(BLOCK (ERASE-ALL-NOOES NODE) 

(INCREMENT *SUBST-COUNT* ) 
(ALTER-NODE ARG (SUBSTP := T)) 
(COPY-CODE ARG)) 
NODE ) ) 
(LAMBDA 
(IF (AND (EFFECTLESS-EXCEPT-CONS EFFS) (EFFECTLESS AFFD)) 

(ALTER-LAMBDA FM (BODY := (SUBSTITUTE (LAMBDA\BODY FM))))) 
(IF (NODEXMETAP NODE) 

(ALTER-NODE NODE (METAP := (NODEXMETAP (LAMBOAXBODY FM))))) 
NODE) 
(IF 
(ALTER-IF FM (PRED := (SUBSTITUTE (IFXPREO FM)))) 
(IF (PASSABLE (IFXPRED FM) EFFS AFFD) 
(ALTER-IF FM 

(CON := (SUBSTITUTE ( IFXCON FM))) 
(ALT := (SUBSTITUTE (IF\ALT FM))))) 
(IF (NODEXMETAP NODE) 
(ALTER-NODE NODE 

(METAP := (AND (NODEXMETAP (IFXPRED FM)) 
(NODEXMETAP (IF\CON FM)) 
(NODEXMETAP ( IF\ALT FM)))))) 
NODE ) 
(ASET 
(ALTER-ASET FM (BODY := (SUBSTITUTE (ASETXBODY FM)))) 
(IF (NODEXMETAP NODE) 

(ALTER-NODE NODE (METAP := (NODEXMETAP (ASETXBODY FM))))) 
NODE ) 
(CATCH 
(ALTER-CATCH FM (BODY := (SUBSTITUTE (CATCHXBODY FM)))) 
(IF (NODEXMETAP NODE) 

(ALTER-NODE NODE (METAP := (NODEXMETAP (CATCHXBODY FM))))) 
NODE) 
(LABELS 
(ALTER-LABELS FM (BODY := (SUBSTITUTE (LABELSXBODY FM)))) 
(DO ((D (LABELSXFNDEFS FM) (CDR D)) 

(MP (NODEXMETAP (LABELSXBODY FM)) 
(AND MP (NODEXMETAP (CAR D))))) 
((NULL D) 
(IF (NODEXMETAP NODE) 

(ALTER-NODE NODE (METAP :*= MP)))) 
(RPLACA D (SUBSTITUTE (CAR D)))) 
NODE) 
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The most complicated case is the COMBINATION. First it is determined (in 
the variable X) whether ARG can correctly pass all of the arguments of the 
combination. (It is not possible to substitute into any argument unless all can 
be passed, because at this time it has not been decided in what order to evaluate 
them. This decision is the free choice of CONVERT-COMBINATION below.) If it 
can, then substitution is attempted in all of the arguments except the function 
itself. Then two kinds of function are distinguished. If it is not a LAMBDA, a 
straightforward recursive call to SUBSTITUTE is used. If it is, then 
substitution is attempted in the body of the LAMBDA (not in the LAMBDA itself; 
substitution in a LAMBDA requires that ARG be EFFECTLESS-EXCEPT-CONS, but in this 
special case we know that the LAMBDA-expression will be invoked immediately, and 
so it is all right if ARG has side-effects). 
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070 (COMBINATION " 

071 (LET ((ARGS (COMBINATIONXARGS FM))) 

072 (DO ((A ARGS (COR A)) 

073 (X T (ANO X (PASSABLE (CAR A) EFFS AFFD)))) 

074 ((NULL A) 

075 (IF X (DO ((A (CDR ARGS) (CDR A))) 

076 ((NULL A)) 

077 (RPLACA A (SUBSTITUTE (CAR A))))) 

078 (IF (AND *LAMBDA-BODY-SUBST* 

079 (EO (TYPE (NODEXFORM (CAR ARGS))) 'LAMBDA)) 

080 (LET ((FN (NODEXFORM (CAR ARGS)))) 

081 (INCREMENT *LAMBDA-BODY-SUBST-TRY-COUNT* ) 

082 (COND (X 

083 (INCREMENT 

J 8 * *LAMBDA-BODY-SUBST-SUCCESS-COUNT* ) 

085 (ALTtR-LAMBDA 

086 FN 

087 (BODY := (SUBSTITUTE 

088 (LAMBDAXBODY FN)))))) 
069 (IF (NODEXMETAP (CAR ARGS)) 

090 (ALTER-NODE 

091 (CAR ARGS) 

092 (METAP := (NODEXMETAP 

J? 3 (LAMBDAXBODY FN)))))) 

°*Z (IF X (RPLACA ARGS (SUBSTITUTE (CAR ARGS))))))) 

095 (DO ((A ARGS (CDR A)) 

096 (MP T (AND MP (NODEXMETAP (CAR A))))) 

097 ((NULL A) 

098 (IF (NOOEXMETAP NODE) 

099 (ALTER-NODE NODE (METAP := MP)))))) 
JOO NODE))))))) 

101 (SUBSTITUTE BOD))))) 
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COPY-CODE is used by META-SUBSTITUTE to make copies of node-trees 
representing code. It invokes COPY-NODES with appropriate additional arguments. 

COPY-NODES does the real work. The argument ENV is analogous to the 
argument ENV taken by ALPHATIZE. However, variables are not looked-up in ENV by 
COPY-NODES; ENV is maintained only to install in the new nodes for debugging 
purposes. The argument RNL is a "rename list" for variables. When a node is 
copied which binds variables, new variables are created for the copy. RNL 
provides a mapping from generated names in the original code to generated names 
in the copy (as opposed to ENV, which maps user names to generated names in the 
copy). Thus, when a LAMBDA node is copied, new names are generated, and PAIRLIS 
is used to pair new names with the LAMBDA\VARS of the old node, adding the new 
pairs to RNL. 

A neat trick to aid debugging is that the new names are generated by 
using the old names as the arguments to GENTEMP. In this way the name of a 
generated variable contains a history of how it was created. For example, VAR- 
34-73-156 was created by copying the LAMBDA node which bound VAR-34-73, which in 
turn was copied from the node which bound VAR-34. Copies of CATCH and LABELS 
variables are generated in the same way. 

The large EQCASE handles the different types of nodes. The result is 
then given to NODIFY, the same routine which creates nodes for ALPHATIZE. Recall 
that NODIFY initializes the METAP slot to NIL; the next meta-evaluation which 
comes along will cause pass-1 analysis to be performed on the new copies. 

Note particularly that the UVARS list of a LAMBDA node is copied , for the 
same reason that ALPHA-LAMBDA makes a copy: META-COMBINATION-LAMBDA may alter it 
destructively. 
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(DEFINE COPY-CODE 

(LAMBDA (NODE) 

(REANALYZE1 (COPY-NODES NODE (NODE\ENV NODE) NIL)))) 

(DEFINE 
COPY- NODES 

(LAMBDA (NODE ENV RNL ) 
(NODIFY 
(LET ((FM (NODE\FORM NODE))) 
(EQCASE (TYPE FM) 
(CONSTANT 

(CONS-CONSTANT (VALUE 
(VARIABLE 
(CONS-VARIABLE (VAR = 
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(CONSTANTWALUE FM)))) 



(LET ((SLOT (ASSQ ( VARIABLEWAR FM) RNL))) 

(IF SLOT (CADR SLOT) (VARIABLEWAR FM)))) 
(GLOBALP = (VARIABLEXGLOBALP FM)))) 
(LAMBDA 
(LET ((VARS (AMAPCAR GENTEMP (LAMBDAWARS FM)))) 

(CONS-LAMBDA (UVARS = (APPEND (LAMBDA\UVARS FM) NIL)) 
(VARS * VARS) 
(BODY = (COPY- NODES 

(LAMBDA\BODY FM) 

(PAIRLIS (LAMBDA\UVARS FM) VARS ENV) 
(PAIRLIS (LAMBDAWARS FM) VARS RNL)))))) 
(IF (CONS- IF (PRED = (COPY-NODES ( IF\PRED FM) ENV RNL)) 
(CON = (COPY-NODES (IF\CON FM) ENV RNL)) 
(ALT = (COPY-NODES (IF\ALT FM) ENV RNL)))) 
(ASET 
(CONS-ASET (VAR = (LET ((SLOT (ASSQ (ASETWAR FM) RNL))) 

(IF SLOT (CADR SLOT) (ASETWAR FM)))) 
(GLOBALP » (ASEAGLOBALP FM)) 
(BODY = (COPY-NODES (ASET\BODY FM) ENV RNL)))) 
(CATCH 
(LET ((VAR (GENTEMP (CATCHWAR FM))) 
(UVAR (CATCHMJVAR FM))) 
(CONS-CATCH (UVAR = (CATCH\UVAR FM)) 
(VAR « VAR) 
(BODY * (COPY-NODES 

(CATCHNBODY FM) 
(CONS (LIST UVAR VAR) ENV) 
(CONS (LIST (CATCHWAR FM) VAR) RNL)))))) 
(LABELS 
(LET ((FNVARS (AMAPCAR GENTEMP (LABELS\FNVARS FM)))) 

(LET ((LENV (PAIRLIS (LABELS\UFNVARS FM) FNVARS ENV)) 
(LRNL (PAIRLIS (LABELS\FNVARS FM) FNVARS RNL))) 
(CONS-LABELS (UFNVARS = (LABELS\UFNVARS FM)) 
(FNVARS = FNVARS) 
(FNDEFS = (AMAPCAR 

(LAMBDA (N) (COPY-NODES N LENV LRNL)) 
(LABELS\FNDEFS FM))) 
(BODY - (COPY-NODES (LABELS\BODY FM) 
LENV 
LRNL)))))) 



(COMBINATION 
(CONS-COMBINATION (ARGS 



(NODE\SEXPR NODE) 
ENV))) 



(WARNP 



(AMAPCAR (LAMBDA (N) (COPY-NODES N ENV RNL)) 

(COMBINATION\ARGS FM))) 
■ (COMBINATIONWARNP FM)))))) 
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The next several functions process the node-tree produced, analyzed, and 
optimized by pass 1, converting it to another representation. This new 
representation is a tree structure very similar to the node-tree, but has 
different components for the pass-2 analysis. We will call this the "cnode- 
tree". The "c" stands for "Continuation-passing style": for the conversion 
process transforms the node-tree into a form which uses continuation-passing to 
represent the control and data flow within the program. 

We define a new collection of data types used to construct cnode-trees. 
The C1SIODE data type is analogous to the NODE data type; one component CFORM 
contains a variant structure which is specific to the programming construct 
represented by the CNODE. 

The types CVARIABLE, CLAMBDA, CIF, CASET, CLABELS, and COMBINATION 
correspond roughly to their non-C counterparts in pass 1. 

Type TRIVIAL is used to represent pieces of code which were designated 
trivial in pass 1 (TRIVP slot = T) by TRIV-ANALYZE; the NODE component is simply 
the pass-1 node-tree for the trivial code. This is the only case in which part 
of the pass-1 node-tree survives the conversion process to be used in pass 2. 

A CONTINUATION is just like a CLAMBDA except that it has only one bound 
variable VAR. This variable can never appear in a CASET, and so the CONTINUATION 
type has no ASETVARS slot; all other slots are similar to those in a CLAMBDA 
structure . 

A RETURN structure is just like a COMBINATION, except that whereas a 
CCOMBINATION may invoke a CLAMBDA which may take any number of arguments, a 
RETURN may invoke only a CONTINUATION on a single value. Thus, in place of the 
ARGS slot of a CCOMBINATION, which is a list of cnodes, a RETURN has two slots 
CONT and VAL, each of which is a cnode. 

(In retrospect, this was somewhat of a design error. The motivation was 
that the world of closures could be dichotomized into LAMBDA-closures and 
continuation-closures, as a result of the fundamental semantics of the language: 
one world is used to pass values "down" into functions, and the other to pass 
values "up" from functions. Combinations can similarly be dichotimized, and I 
thought it would be useful to reflect this distinction in the data types to 
enforce and error-check this dichotomy. However, as it turned out, there is a 
great deal of code in pass 2 which had to be written twice, once for each 
"world", because the data types involved were different. It would be better to 
have a single structure for both CLAMBDA and CONTINUATION, with an additional 
slot flagging which kind it was. Then most code in pass 2 could operate on this 
structure without regard for which "world" it belonged to, and code which cared 
could check the flag.) 
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;;; THIS INVOLVES MAKING A COMPLETE COPY OP THE PROGRAM IN TERMS 
;;; OF THE FOLLOWING NEW DATA STRUCTURES: 

(DEFTYPE CNOOE (ENV REFS CLOVARS CFORM)) 

;ENV ENVIRONMENT (A LIST OF VARIABLES, NOT A MAPPING; DEBUGGING ONLY) 

;REFS VARIABLES BOUND ABOVE AND REFERENCED BELOW THIS CNODE 
;CLOVARS VARIABLES REFERRED TO AT OR BELOW THIS CNODE BY CLOSURES 
; (SHOULD BE A SUBSET OF REFS) 

;CFORM ONE OF THE BELOW TYPES 
(DEFTYPE TRIVIAL (NODE)) 

;NODE A PASS-1 NODE TREE 
(DEFTYPE CVARIABLE (VAR)) 

;VAR GENERATED VARIABLE NAME 

(DEFTYPE CLAMBDA (VARS BODY FNP TVARS NAME DEP MAXDEP CONSENV CLOSEREFS ASETVARS)) 
FNP NON-NIL => NEEON'T MAKE A FULL CLOSURE OF THIS 

CLAMBDA. MAY BE 'NOCLOSE OR 'EZCLOSE (THE FORMER 
MEANING NO CLOSURE IS NECESSARY AT ALL, THE LATTER 
THAT THE CLOSURE IS MERELY THE ENVIRONMENT). 
TVARS THE VARIABLES WHICH ARE PASSED THROUGH TEMP LOCATIONS 
ON ENTRY. NON-NIL ONLY IF FNP=' NOCLOSE ; THEN IS 
NORMALLY THE LAMBDA VARS, BUT MAY BE DECREASED 
TO ACCOUNT FOR ARGS WHICH ARE THEMSELVES KNOWN NOCLOSE' S, 
OR WHOSE CORRESPONDING PARAMETERS ARE NEVER REFERENCED. 
THE TEMP VARS INVOLVED START IN NUMBER AT DEP. 
NAME THE PROG TAG USED TO LABEL THE FINAL OUTPUT CODE FOR THE CLAMBDA 
DEP DEPTH OF TEMPORARY REGISTER USAGE WHEN THE CLAMBDA IS INVOKED 

MAXDEP MAXIMUM DEPTH OF REGISTER USAGE WITHIN CLAMBDA BODY 
CONSENV THE "CONSED ENVIRONMENT" WHEN THE CLAMBDA IS EVALUATED 
CLOSEREFS VARIABLES REFERENCED BY THE CLAMBDA WHICH ARE NOT IN 
THE CONSED ENVIRONMENT AT EVALUATION TIME, AND SO MUST BE 
ADDED TO CONSENV AT THAT POINT TO MAKE THE CLOSURE 
ASETVARS THE ELEMENTS OF VARS WHICH ARE EVER SEEN IN A CASET 
(DEFTYPE CONTINUATION (VAR BODY FNP TVARS NAME DEP MAXDEP CONSENV CLOSEREFS)) 

;COMPONENTS ARE AS FOR CLAMBDA 
(DEFTYPE CIF (PRED CON ALT)) 
(DEFTYPE CASET (CONT VAR BODY)) 
(DEFTYPE CLABELS (FNVARS FNDEFS FNENV EASY CONSENV BODY)) 

FNENV A LIST OF VARIABLES TO CONS ONTO THE ENVIRONMENT BEFORE 

CREATING THE CLOSURES AND EXECUTING THE BODY 
EASY NON-NIL IFF NO LABELED FUNCTION IS REFERRED TO 
AS A VARIABLE. CAN BE 'NOCLOSE OR 'EZCLOSE 
(REFLECTING THE STATUS OF ALL THE LABELLED FUNCTIONS) 
CONSENV AS FOR CLAMBDA 
(DEFTYPE COMBINATION (ARGS)) 

;ARGS LIST OF CNODES REPRESENTING ARGUMENTS 
(DEFTYPE RETURN (CONT VAL ) ) 

;CONT CNODE FOR CONTINUATION 
;VAL CNODE FOR VALUE 
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CNODIFY is for cnode-trees what NODIFY was for node-trees. It takes a 
CFORM and wraps a CNODE structure around it. 

CONVERT is the main function of the conversion process; it is invoked by 
COMPILE on the result (META-VERSION) of pass 1. CONVERT dispatches on the type of 
node to be converted, often calling some specialist which may call it back 
recursively to convert subnodes. CONT may be a cnode, or NIL. If it is a cnode, 
then that cnode is the code for a continuation which is to receive as value that 
produced by the code to be converted. That is, when CONVERT finishes producing 
code for the given node (the first argument to CONVERT), then in effect a RETURN 
is created which causes the value of the generated code to be returned to the 
code represented by CONT (the second argument to CONVERT). Sometimes this RETURN 
cnode is created explicitly (as for CONSTANT and VARIABLE nodes), and sometimes 
only implicitly, by passing CONT down to a specialist converter. 

MP is T if optimization was performed by pass 1, and NIL otherwise. This 
argument is for debugging purposes only: CONVERT compares this to the METAP slot 
of the pass-1 nodes in order to detect any failures of the incremental 
optimization and analysis process. CONVERT also makes some other consistency 
checks . 
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002 (DEFINE CNODIFY ~ ' — L -^—-^^-^L 

003 (LAMBDA (CFORM) 

004 (CONS-CNODE (CFORM = CFORM)))) 
005 

006 (DEFINE CONVERT 

00 7 (LAMBDA (NODE CONT MP) 

008 (LET ((FM (NOOENFORM NODE))) 

009 (IF (EMPTY (NODENTRIVP NODE)) 

010 (ERROR ' |Pass 1 analysis missingl NODE 'FAIL-ACT)) 

011 (OR (EO (NODENMETAP NODE) MP) 

012 (ERROR ' |Meta-eva1uation Screwed Up METAP| NODE 'FAIL-ACT)) 

013 (EQCASE (TYPE FM) 

014 (CONSTANT 

015 (OR (NODENTRIVP NODE) 

°J5 (ERROR ' |Non-trivial Constant| NODE 'FAIL-ACT)) 

017 (MAKE-RETURN (CONS-TRIVIAL (NODE = NODE)) CONT)) 

018 (VARIABLE 

019 (OR (NODENTRIVP. NODE) 

020 (ERROR ' |Non-trivial Variable! 'FAIL-ACT)) 

021 (MAKE-RETURN (CONS-TRIVIAL (NODE = NODE)) CONT)) 

022 (LAMBDA (MAKE-RETURN (CONVERT-LAMBDA-FM NODE NIL MP) CONT)) 
° 23 O f (OR CONT (ERROR ' (Null Continuation to IF | NODE 'FAIL-ACT)) 
024 (CONVERT- IF NODE FM CONT MP)) 

?" ( ASET (OR CONT (ERROR ' (Null Continuation to ASET| NODE 'FAIL-ACT)) 

026 (CONVERT-ASET NODE FM CONT MP)) 

° n { 7 a (CATCH (OR CONT (ERROR ' |Null Continuation to CATCH | NODE 'FAIL-ACT)) 

028 (CONVERT-CATCH NODE FM CONT MP)) 

° n ll (LABELS (OR CONT (ERROR '|Null Continuation to LABELS| NODE 'FAIL-ACT)) 

030 (CONVERT-LABELS NODE FM CONT MP)) 

H\ (COMBINATION (OR CONT (ERROR ' |Nu!l Continuation to Combination! 

032 NODE 

2!f 'FAIL-ACT)) 



034 



(CONVERT-COMBINATION NODE FM CONT MP)))))) 
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MAKE-RETURN takes a CFORM (one of the types TRIVIAL, CVARIABLE, ...) and 
a continuation, and constructs an appropriate returning of the value of the CFORM 
to the continuation. First the CFORM is given to CNODIFY. If the continuation 
is in fact NIL (meaning none), this new cnode is returned; otherwise a RETURN 
cnorip is constructed. 

CONVERT-LAMBDA-FM takes a LAMBDA node and converts it into a CLAMBDA 
cnode. The two are isomorphic, except that an extra variable is introduced as an 
extra first parameter to the CLAMBDA. Conceptually, this variable will be bound 
to a continuation when the CLAMBDA is invoked at run time; this continuation is 
the one intended to receive the value of the body of the CLAMBDA. This is 
accomplished by creating a new variable name CONT-nnn, which is added into the 
lambda variables. A new CVARIABLE node is made from it, and given to CONVERT as 
the continuation when the body of the LAMBDA node is to be recursively converted. 

The CNAME argument is used for a special optimization trick by CONVERT- 
COMBINATION, described below. 

CONVERT-IF distinguishes several cases, to simplify the converted code. 
First, if the entire IF node is trivial, then a simple CTRIVIAL node may be 
created for it. Otherwise, the general strategy is to generate code which will 
bind the given continuation to a variable and evaluate the predicate. This 
predicate receives a continuation which will examine the resulting value (with a 
CIF), and then perform either the consequent or alternative, which are converted 
using the bound variable as the continuation. The reason that the original 
continuation is bound to a variable is because it would be duplicated by using it 
for two separate calls to CONVERT, thereby causing duplicate code to be generated 
for it. A schematic picture of the general strategy is: 

NODE = (IF a b c) and CONT = k becomes 

((CLAMBDA (q) 

(RETURN (CONTINUATION (p) 

(CIF p 



a)) 
k) 



(RETURN q b) 
(RETURN q c))) 



Now there are two special cases which allow simplification. First, if the given 
continuation is already a cvariable, there is no point in creating a new one to 
bind it to. This eliminates the outer COMBINATION and CLAMBDA. Second, if the 
predicate a is trivial (but the whole IF is not, because the consequent b or the 
alternative c is non-trivial), then the CONTINUATION which binds p is 
unnecessary. 
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RABBIT 568 
(DEFINE MAKE -RE TURN 

(LAMBDA (CFORM CONT) 

(LET ((CN (CNODIFY CFORM))) 
(IF CONT 

(CNODIFY (CONS-RETURN (CONT - CONT) (VAL 
CN)))) 
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CN))) 



(OR CNAME CV)))) 



(DEFINE CONVERT-LAMBDA-FM 

(LAMBDA (NODE CNAME MP) 

(LET ((CV (GENTEMP 'CONT)) 

(FM (NODE\FORM NODE))) 
(CONS-CLAMBDA (VARS * (CONS CV (LAMBDAWARS FM))) 
(BODY = (CONVERT (LAMBDA\BODY FM) 
(CNODIFY 

(CONS-CVARIABLE (VAR 
MP)))))) 

ISSUES FOR CONVERTING IF: 

(1) IF WHOLE IF IS TRIVIAL. MAY JUST CREATE A CTRIVIAL. 

(2) IF CONTINUATION IS NON-CVARIABLE , MUST BIND A VARIABLE TO IT. 

(3) IF PREDICATE IS TRIVIAL, MAY JUST STICK IT IN SIMPLE CIF. 

(DEFINE CONVERT-IF 

(LAMBDA (NODE FM CONT MP) 

(IF (NODE\TRIVP NODE) 

(MAKE-RETURN (CONS-TRIVIAL (NODE = NODE)) CONT) 
(LET ((CVAR (IF (EQ (TYPE (CNODE\CFORM CONT)) 'CVARIABLE) 
NIL 

(GENTEMP 'CONT))) 
(PVAR (IF (NODENTRIVP (IF\PRED FM)) 
NIL 

(NODENNAME ( IF\PRED FM))))) 
(LET ((ICONT (IF CVAR 

(CNODIFY (CONS-CVARIABLE (VAR = CVAR))) 
CONT)) 
(IPRED (IF PVAR 

(CNODIFY (CONS-CVARIABLE (VAR = PVAR))) 
(CNODIFY (CONS-TRIVIAL (NODE = ( IF\PRED FM))))))) 
(LET ((CIF (CNODIFY 
(CONS-CIF 
(PRED ■ IPRED) 

(CON = (CONVERT ( IF\CON FM) ICONT MP)) 
(ALT = (CONVERT ( IF\ALT FM) 
(CNODIFY 
(CONS-CVARIABLE 
(VAR = (CVARIABLEWAR 

(CNODENCFORM ICONT))))) 
MP)))))) 
(LET ((FOO (IF PVAR 

(CONVERT (IFXPRED FM) 

(CNODIFY (CONS-CONTINUATION (VAR « PVAR) 

(BODY - CIF))) 
MP) 
CIF))) 
(IF CVAR 

' (CNODIFY 

(CONS-CCOMBINATION 
(ARGS = (LIST (CNODIFY 

(CONS-CLAMBDA 
(VARS = (LIST CVAR)) 
(BODY « FOO))) 
CONT)))) 
FOO)))))))) 
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This is all done as follows. First CVAR and PVAR are bound to generated 
names if necessary, CVAR for binding the continuation and PVAR for binding the 
predicate value. Then ICONT and IPRED (the "I" is a mnemonic for "internal") are 
bound to the cnodes to be used for the two conversions of consequent and 
alternative, and for the predicate of the CIF, respectively. CIF is then bound 
to the cnode for the CIF code, including the conversions of consequent and 
alternative. Finally, using FOO as an intermediary, CONVERT-IF first 
conditionally arranges for conversion of a non-trivial predicate, and then 
conditionally arranges for the binding of a non-cvariable continuation. The 
result of all this is returned as the final conversion of the original IF node. 

CONVERT-ASET is fairly straightforward, except that, as for IF nodes, a 
special case is made of trivial nodes, as determined by the TRIVP slot. 

The CATCH construct may be viewed as the user's one interface between the 
"LAMBDA world" and the "continuation world". CONVERT-CATCH arranges its 
conversion in such a way as to eliminate CATCH entirely. Because CONTINUATION 
cnodes provide an explicit representation for the continuations involved, there 
is no need at this level to have an explicit CCATCH sort of cnode. The general 
idea is: 

NODE = (CATCH a b) and CONT = k becomes 

((CLAMBDA (q) 

((CLAMBDA (a) (RETURN q b)) 
(CLAMBDA (*IGNORE* V) (RETURN q V)))) 
M 

In the case where the given continuation k is already a cvariable, then it need 
not be bound to a new one q. Note that the (renamed) user catch variable a is 
bound to a CLAMBDA which ignores its own continuation, and returns the argument V 
to the continuation of the CATCH. Thus the user variable a is bound not to an 
actual CONTINUATION, but to a little CLAMBDA which interfaces properly between 
the CLAMBDA world and the CONTINUATION world. The uses of CVAR and ICONT are 
analogous to their uses in CONVERT-IF. 
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002 (DEFINE CONVERT -ASET 

003 (LAMBDA (NODE FM CONT MP) 

004 ( IF (NODEURIVP NODE) 

005 (MAKE-RETURN (CONS-TRIVIAL (NODE = NODE)) CONT) 

006 (CONVERT (ASEABODY FM) 

00 7 (LET ((NM (NODE\NAME (ASEABODY FM)))) 

008 (CNODIFY 

009 (CONS-CONTINUATION 

010 (VAR = NM) 

Oil (BODY = (CNODIFY 

012 (CONS-CASET 

013 (CONT = CONT) 

014 (VAR = (ASEAVAR FM)) 

015 (BODY = (CNODIFY (CONS-CVARIABLE 

016 (VAR = NM)))))))))) 

017 MP)))) 
018 

019 ;;; ISSUES FOR CONVERTING CATCH: 

020 ;;; (1) MUST BIND THE CATCH VARIABLE TO A FUNNY FUNCTION WHICH IGNORES ITS CONTINUATION: 

021 ;;; (2) IF CONTINUATION IS NON-CVARIABLE , MUST BIND A VARIABLE TO IT. 
022 

023 (DEFINE 

024 CONVERT-CATCH 

025 (LAMBDA (NODE FM CONT MP) 

026 (LET ((CVAR (IF (EQ (TYPE (CNODE\CFORM CONT) ) 'CVARIABLE) 

027 NIL 

028 (GENTEMP 'CONT)))) 

029 (LET ((ICONT (IF CVAR 

030 (CNODIFY (CONS-CVARIABLE (VAR = CVAR))) 

031 CONT))) 

032 (LET ((CP (CNODIFY 

033 (CONS-CCOMBINATION 

034 (ARGS = (LIST (CNODIFY 

035 (CONS-CLAMBDA 

036 (VARS = (LIST (CATCHWAR FM))) 

037 (BODY = (CONVERT (CATCHXBODY FM) ICONT MP)))) 

038 (CNODIFY 

039 (CONS-CLAMBOA 

040 (VARS = '(*IGNORE* V)) 

041 (BODY = (MAKE-RETURN 

042 (CONS-CVARIABLE (VAR - 'V)) 

043 (CNODIFY 

044 (CONS-CVARIABLE 

045 (VAR = (CVARIABLEWAR 

046 (CNODEXCFORM ICONT))))))))))))))) 

047 (IF CVAR (CNODIFY 

048 (CONS-CCOMBINATION 

049 (ARGS = (LIST (CNODIFY 

050 (CONS-CLAMBDA (VARS = (LIST CVAR)) 

051 (BODY = CP))) 

052 CONT)))) 

053 CP)))))) 
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CONVERT-LABELS simply converts all the labelled function definitions 
using NIL as the continuation for each. This reflects the fact that no code 
directly receives the results of closing the definitions; rather, they simply 
become part of the environment. The body of the LABELS is converted using the 
given continuation. 

To make things much simpler for the pass-2 analysis and the code 
generator, it is forbidden to use ASET' on a LABELS-bound variable. This is an 
arbitrary restriction imposed by RABBIT (out of laziness on my part and a desire 
to concentrate on more important issues), and not one inherent in the SCHEME 
language. This restriction is unnoticeable in practice, since one seldom uses 
ASET' at all, let alone on a LABELS variable. 



The conversion of COMBINATION nodes is the most complex of all cases. 
First, a trivial combination becomes simply a TRIVIAL cnode. Otherwise, the 
overall idea is that each argument is converted, and the continuation given to 
the conversion is the conversion of all the following arguments. The conversion 
of the last argument uses a continuation which performs the invocation of 
function on arguments, using all the bound variables of the generated 
continuations. The end result is a piece of code which evaluates one argument, 
binds a variable to the result, evaluates the next, etc., and finally uses the 
results to perform a function call. 

To simplify the generated code, the arguments are divided into two 
classes. One class consists of trivial arguments and LAMBDA-expressions (this 
class is precisely the class of "trivially evaluable" expressions defined in 
[Imperative]), and the other class consists of the remaining arguments. The 
successive conversion using successive continuations as in the general theory is 
only performed on the latter class of arguments. The trivially evaluable 
expressions are included along with the bound variables for non-trivial argument 
values in the final function call. For example, one might have something like: 

NODE = (F00 (CONS A B) (BAR A) B (BAZ B)) and CONT = k becomes 

(RETURN (CONTINUATION (x) 

(RETURN (CONTINUATION (y) 

(FOO k (CONS A B) x B y)) 
(BAZ B))) 
(BAR A)) 

where FOO, (CONS A B), and B are trivial, but (BAR A) and (BAZ B) are not. 
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ISSUES FOR CONVERTING LABELS: 

(1) MUST CONVERT ALL THE NAMED LAMBDA-EXPRESSIONS, 

(2) TO MAKE THINGS EASIER LATER. WE FORBID ASET ON 

(OEFINE 



R M? * I 568 05/15/78 Page 30 

USING A NULL CONTINUATION. 
A LABELS VARIABLE. 



CONVERT-LABELS 

(LAMBDA (NODE FM CONT MP) 

(DO ((F (LABELSNFNDEFS FM) (CDR F)) 
(V (LABELSNFNVARS FM) (COR V)) 
(CF NIL (CONS (CONVERT (CAR F) NIL MP) CF))) 
((NULL F) 
(CNODIFY (CONS-CLABELS 



= (LABELSXFNVARS FM)) 

= (NREVERSE CF ) ) 

(CONVERT (LABELSXBODY FM) CONT MP))))) 



(FNVARS 
(FNDEFS 
(BODY = 
(AND (GET (CAR V) 'WRITE-REFS) 

(ERROR ' |Are you crazy, using ASET on a LABELS variable?! 
(CAR V) 
•FAIL-ACT))))) 
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The separation into two classes is accomplished by the outer DO loop. 
DELAY-FLAGS is a list of flags describing whether the code can be "delayed" (not 
converted using strung-out continuations) because it is trivially evaluable. The 
inner DO loop of the three (which loops on variables A, D, and Z, not A, D, and 
F!) then constructs the final function call, using the "delayed" arguments and 
generated continuation variables. The names used for the variables are the names 
of the corresponding nodes, which were generated by NODIFY. Finally, the middle 
DO loop (which executes last because the "inner" DO loop occurs in the 
initialization, not the body, of the "middle" one) generates the strung-out 
continuations, converting the non-delayable arguments in reverse order, so as to 
generate the converted result from the inside out. 

The net effect is that non-trivial arguments are evaluated from left-to- 
right, and trivial ones are also (as it happens, because of MacLISP semantics), 
but the two classes are intermixed. This is where RABBIT takes advantage of the 
SCHEME semantics which decree that arguments to a combination may be evaluated in 
any order. It is also why CHECK-COMBINATION-PEFFS tries to detect infractions of 
this rule. 

A special trick is that if the given continuation is a variable, and the 
combination is of the form ((LAMBDA ...) ...), then it is arranged to use the 
given continuation as the continuation for converting the body of the LAMBDA, 
rather than the extra variable which is introduced for a continuation in the 
LAMBDA variables list (see CONVERT-LAMBDA-FM) . This effectively constitutes the 
optimization of substituting one continuation variable for another, much as META- 
COMBINATION-LAMBDA may substitute one variable for another. (This turns out to 
be the only optimization of importance to be done on pass-2 cnode code; rather 
than building a full-blown optimizer for pass-2 cnode-trees, or arranging to make 
the optimizer usable on both kinds of data structures, it was easier to tweak the 
conversion of combinations.) The substitution is effected by passing a non-NIL 
CNAME argument to CONVERT-LAMBDA-FORM, as computed by the form (AND (NULL (CDR 
A)) ...). 
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ISSUES FOR CONVERTING COMBINATIONS: 

(1) TRIVIAL ARGUMENT EVALUATIONS ARE DELAYED AND ARE NOT BOUND TO THE VARIABLE OF 
A CONTINUATION. WE ASSUME THEREBY THAT THE COMPILER IS PERMITTED TO EVALUATE 
OPERANDS IN ANY ORDER. 

(2) ALL NON-DELAYABLE COMPUTATIONS ARE ASSIGNED NAMES AND STRUNG OUT WITH CONTINUATIONS. 

(3) IF CONT IS A CVARIABLE AND THE COMBINATION IS ((LAMBDA ...) ...) THEN WHEN CONVERTING 
THE LAMBDA-EXPRESSION WE ARRANGE FOR ITS BODY TO REFER TO THE CVARIABLE CONT RATHER 
THAN TO ITS OWN CONTINUATION. THIS CROCK EFFECTIVELY PERFORMS THE OPTIMIZATION OF 
SUBSTITUTING ONE VARIABLE FOR ANOTHER, ONLY ON CONTINUATION VARIABLES (WHICH COULDN'T 
BE CAUGHT BY META-EVALUATE ) . 



002 
003 
004 
005 
006 
007 
008 
009 
010 
Oil 
012 

013 (DEFINE 

014 CONVERT-COMBINATION 

015 (LAMBDA (NODE FM CONT MP) 

016 (IF (NODE\TRIVP NOOE ) 

017 (MAKE-RETURN (CONS-TRIVIAL (NODE = NODE)) CONT) 

018 (DO ((A (COMBINATIONNARGS FM) (CDR A)) 

019 (DELAY-FLAGS NIL 

020 (CONS (OR (NODE\TRIVP (CAR A)) 

021 (EQ (TYPE (NODEXFORM (CAR A))) 'LAMBDA)) 

022 DELAY-FLAGS))) 

023 ((NULL A) 

024 (DO ((A (REVERSE (COMBINATION\ARGS FM)) (CDR A)) 

025 (D DELAY-FLAGS (CDR D)) 

026 (F (CNODIFY 

027 (CONS-CCOMBINATION 

028 (ARGS = (DO ({A (REVERSE (COMBINATION\ARGS FM)) (CDR A)) 

029 (D DELAY-FLAGS (CDR D)) 

030 (Z NIL (CONS (IF (CAR D) 

031 (IF (EQ (TYPE (NO0E\FORM (CAR A))) 

032 'LAMBDA) 

033 (CNODIFY 

034 (CONVERT-LAMBDA-FM 

035 (CAR A) 

036 (AND (NULL (CDR A)) 

037 (EQ (TYPE 

038 (CNODENCFORM CONT)) 

039 'CVARIABLE) 

040 (CVARIABLEWAR 

041 (CNODENCFORM CONT))) 

042 MP)) 

043 (CNODIFY 

044 (CONS-TRIVIAL 

045 (NODE = (CAR A))))) 

046 (CNODIFY 

047 (CONS-CVARIABLE 

048 (VAR = (NODEXNAME (CAR A)))))) 

049 Z))) 

050 ((NULL A) (CONS (CAR Z) (CONS CONT (CDR Z)))))))) 

051 (IF (CAR D) F 

052 (CONVERT (CAR A) 

053 (CNODIFY (CONS-CONTINUATION 

054 (VAR = (NODENNAME (CAR A))) 

055 (BOOY = F))) 

056 MP)))) 

057 ((NULL A) F))))))) 
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Once the pass-2 cnode-tree is constructed, a pass-2 analysis is performed 
in a manner very similar to the pass-1 analysis. As before, successive routines 
are called which recursively process the code tree and pass information up and 
down, filling in various slots and putting properties on the property lists of 
variable names. 

The first routine, CENV-ANALYZE, is similar to ENV-ANALYZE, but differs 
in some important respects. Two slots are filled in for each cnode. The slot 
ENV is computed from the top down, while REFS is computed from the bottom up. 

ENV is the environment, a list of bound variables visible to the cnode. 
The ENV slot in the node-tree was a mapping (an alist), but this ENV is only a 
list. The argument ENV is used in the analysis of CVARIABLE and CASET nodes. 
The cnode slot ENV is included only for debugging purposes, and is never used by 
RABBIT itself. 

REFS is analogous to the REFS slot of a node-tree: it is the set of 
variables bound above and referenced below the cnode. It differs from the pass-1 
analysis in that variables introduced to name continuations and variables bound 
by continuations are also accounted for. In the case of a TRIVIAL cnode, 
however, the REFS are precisely those of the contained node. 

The argument FNP to CENV-ANALYZE in non-NIL iff the given cnode occurs in 
"functional position" of a COMBINATION or RETURN cnode. This is used when a 
variable is encountered; on the property list a VARIABLE-REFP property is placed 
iff FNP is NIL, indicating that the variable was referenced in "variable (non- 
function) position". This information will be used by the next phase, BIND- 
ANALYZE. 



WE WISH TO DETERMINE THE ENVIRONMENT AT EACH CNOOE , 
AND DETERMINE WHAT VARIABLES ARE BOUND ABOVE AND 
REFERRED TO BELOW EACH CNODE . 

FOR EACH CNODE WE FILL IN THESE SLOTS: 

ENV THE ENVIRONMENT SEEN AT THAT CNODE (A LIST OF VARS) 

REFS VARIABLES BOUND ABOVE AND REFERRED TO BELOW THAT CNODE 

FOR EACH VARIABLE REFERRED TO IN NON-FUNCTION POSITION 

BY A CVARIABLE OR CTRIVIAL CNODE WE GIVE A NON-NIL VALUE TO THE PROPERTY: 
VARIABLE-REFP 
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002 ;;; ENVIRONMENT ANALYSIS FOR CPS VERSION 
003 
004 
005 
006 
007 
008 
009 
010 
Oil 
012 
013 
014 

015 ;;; FNP IS NON-NIL IFF CNODE OCCURS IN FUNCTIONAL POSITION 
016 

017 (DEFINE 

018 CENV-ANALYZE 

019 (LAMBDA (CNODE ENV FNP) 

020 (LET ((CFM (CNODE\CFORM CNODE))) 

021 (ALTER-CNODE CNODE (ENV := ENV)) 
22 (EQCASE (TYPE CFM) 

023 (TRIVIAL 

024 (CENV-TRIV-ANALYZE ( TRIVIAL\NODE CFM) FNP) 

025 (ALTER-CNODE CNODE 

026 (REFS := (NODE\REFS (TRIVIALXNODE CFM))))) 

027 (CVARIABLE 

028 (LET ((V (CVARIABLEWAR CFM))) 

029 (ADDPROP V CNODE 'READ-REFS) 

030 (OR FNP (PUTPROP V T 'VARIABLE-REFP)) 

031 (ALTER-CNODE CNODE 

032 (REFS := (AND (MEMQ V ENV) 

033 (LIST (CVARIABLEWAR CFM))))))) 
34 (CLAMBDA 

035 (LET ((B (CLAMBOAXBODY CFM))) 

036 (CENV-ANALYZE B (APPEND (CLAMBOAWARS CFM) ENV) NIL) 

037 (LET ((REFS (SETDIFF (CNODEXREFS B) (CLAMBOAWARS CFM)))) 

038 (ALTER-CNODE CNODE (REFS := REFS))))) 

039 (CONTINUATION 

040 (LET ((B (CONTINUATIONXBODY CFM))) 

041 (CENV-ANALYZE B (CONS (CONTINUATIONWAR CFM) ENV) NIL) 

042 (LET ((REFS (REMOVE (CONTINUATIONWAR CFM) (CNODEXREFS B)))) 

043 (ALTER-CNODE CNODE (REFS := REFS))))) 

044 (CIF 

045 (LET ((PREO (CIF\PRED CFM)) 

046 (CON (CIFXCON CFM)) 

047 (ALT (CIF\ALT CFM))) 

048 (CENV-ANALYZE PRED ENV NIL) 

049 (CENV-ANALYZE CON ENV NIL) 

050 (CENV-ANALYZE ALT ENV NIL) 

051 (ALTER-CNODE CNODE 

052 (REFS := (UNION (CNODEXREFS PRED) 

053 (UNION (CNODEXREFS CON) 

054 (CNODEXREFS ALT))))))) 

055 (CASET 

056 (LET ((V (CASETWAR CFM)) 

057 (CN (CASETXCONT CFM)) 

058 (B (CASETXBODY CFM))) 

059 (PUTPROP (CASETWAR CFM) T 'VARIABLE-REFP) 

060 (CENV-ANALYZE CN ENV T) 

061 (CENV-ANALYZE B ENV NIL) 

062 (ALTER-CNODE CNODE 

063 (REFS := (LET ((R (UNION (CNODEXREFS CN) 

064 (CNODEXREFS B)))) 

065 (IF (MEMQ V ENV) (ADJOIN V R) R)))))) 

066 (CLABELS 

067 (LET ((LENV (APPEND (CLABELSXFNVARS CFM) ENV))) 

068 (DO ((F (CLABELSXFNDEFS CFM) (CDR F)) 

069 (R NIL (UNION R (CNODEXREFS (CAR F))))) 
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70 ((NULL F) 

071 (LET ((B (CLABELSNBODY CFM))) 

072 (CENV-ANALYZE B LENV NIL) 
73 (ALTER-CNODE CNOOE 

074 (REFS := (SETDIFF (UNION R (CNODENREFS B)) 

075 (CLABELS\FNVARS CFM)))))) 

076 (CENV-ANALYZE (CAR F) LENV NIL)))) 
77 (CCOMBINATION 

078 (LET ((ARGS (CCOMBINATION\ARGS CFM))) 

79 (CENV-ANALYZE (CAR ARGS) ENV T) 

080 (COND ((AND (EQ (TYPE (CNODEVCFORM (CAR ARGS))) 'TRIVIAL) 

081 (EQ (TYPE (NOOEXFORM (TRIVIAL\NODE 

082 (CNOOENCFORM (CAR ARGS))))) 

083 'VARIABLE) 

084 (TRIVFN (VARIABLEWAR 

085 (NODENFORM 

086 (TRIVIAL \NODE 

087 (CNODEXCFORM 

088 (CAR ARGS))))))) 

089 (CENV-ANALYZE (CADR ARGS) ENV T) 

090 (CENV-CCOMBINATION-ANALYZE CNODE 

091 ENV 

092 (CODR ARGS) 

093 (UNION (CNO0E\REFS (CAR ARGS)) 

094 (CNODEXREFS (CADR ARGS))))) 

095 (T (CENV-CCOMBINATION-ANALYZE CNODE 

096 ENV 

097 (CDR ARGS) 

098 (CNODENREFS (CAR ARGS))))))) 

099 (RETURN 

100 (LET ((C (RETURN\CONT CFM)) 

101 (V (RETURNWAL CFM))) 

102 (CENV-ANALYZE C ENV T) 

103 (CENV-ANALYZE V ENV NIL) 

104 (ALTER-CNODE CNODE 

105 (REFS := (UNION (CNODENREFS C) (CNODENREFS V)))))))))) 
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The only purpose of CENV-TRIV-ANALYZE is to go through the code for a 

TRIVIAL cnode, looking for variables occurring in other than function position, 

in order to put appropriate VARIABLE-REFP properties. Notice that the types 

LAMBDA and LABELS do not occur in the EQCASE expression, as nodes of those types 
can never occur in trivial expressions. 

CENV-CCOMBINATION-ANALYZE is a simple routine which analyzes COMBINATION 
cnodes; it is a separate routine only because it is used in more than one place 
in CENV-ANALYZE. It could have been made a local subroutine by using a LABELS in 
CENV-ANALYZE, but I elected not to do so for purely typographical reasons. 
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002 ;;; THIS FUNCTION MUST GO THROUGH AND LOCATE VARIABLES APPEARING IN NONFUNCTION POSITION. 
003 

004 (DEFINE CENV-TRIV-ANALYZE 

005 (LAMBDA (NODE FNP) 

006 (LET ((FM (NODEXFORM NODE))) 

007 (EQCASE (TYPE FM) 

008 (CONSTANT NIL) 

009 (VARIABLE 

010 (OR FNP (PUTPROP (VARIABLEWAR FM) T ' VARIABLE-REFP) ) ) 
Oil (LAMBDA 

012 (OR FNP 

013 (ERROR 'ITrivial closure - CENV-TRIV-ANALYZE | NODE 'FAIL-ACT)) 

014 (CENV-TRIV-ANALYZE (LAMBDANBODY FM) NIL)) 

015 (IF 

016 (CENV-TRIV-ANALYZE (IFXPRED FM) NIL) 

017 (CENV-TRIV-ANALYZE ( IF\CON FM) NIL) 

018 (CENV-TRIV-ANALYZE ( IF\ALT FM) NIL)) 

019 (ASET 

020 (PUTPROP (ASETWAR FM) T 'VARIABLE-REFP) 

021 (CENV-TRIV-ANALYZE (ASEABODY FM) NIL)) 

022 (COMBINATION 

023 (DO ((A (COMBINATIONXARGS FM) (CDR A)) 

024 (F T NIL)) 

025 ((NULL A)) 

026 (CENV-TRIV-ANALYZE (CAR A) F ))))))) 
027 

028 (DEFINE CENV-CCOMBINATION-ANALYZE 

029 (LAMBDA (CNODE ENV ARGS FREFS) 

030 (DO ((A ARGS (CDR A)) 

031 (R FREFS (UNION R (CNODEXREFS (CAR A))))) 

032 ((NULL A) 

033 (ALTER-CNODE CNODE (REFS := R))) 

034 (CENV-ANALYZE (CAR A) ENV NIL)))) 
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The binding analysis is the most complicated phase of pass 2. It 
determines for each function whether or not a closure structure will be needed 
for it at run time (and if so, whether the closure structure must contain a 
pointer to the code); it determines for each variable whether or not it can be 
referred to by a run-time closure structure; and it determines for each function 
how arguments will be passed to it (because for internal functions not apparent 
to the "outside world", any arbitrary argument-passing convention may be adopted 
by the compiler to optimize register usage; in particular, arguments which are 
never referred to need never even be actually passed). If flow analysis 
determines that a given variable always denotes (a closure of) a given functional 
(CLAMBDA) expression, then a KNOWN- FUNCTION property is created to connect the 
variable directly to the function for the benefit of the code generator. 

BIND-ANALYZE is just a simple dispatch to one of many specialists, one 
for each type of CNODE. TRIVIAL and CVARIABLE cnodes are handled directly 
because they are simple. 

The argument FNP is NIL, EZCLOSE, or NOCLOSE, depending respectively on 
whether a full closure structure, a closure structure without a code pointer, or 
no closure structure will be needed if in fact CNODE turns out to be of type 
CLAMBDA (or CONTINUATION). Normally it is NIL, unless determined otherwise by a 
parent CLABELS or COMBINATION cnode. 

The argument NAME is meaningful only if the CNODE argument is of type 
CLAMBDA or CONTINUATION. If non-NIL, it is a suggested name to use for the 
cnode. This name will later be used by the code generator as a tag. The only 
reason for using the suggestion rather than a generated name (and in fact one 
will be generated if the suggested name is NIL) is to make it easier to trace 
things while debugging. 

REFD-VARS is a utility routine. Given a set of variables, it returns the 
subset of them that are actually referenced (as determined by the READ-REFS and 
WRITE-REFS properties which were set up by ENV-ANALYZE and CENV-ANALYZE). 
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BINDING ANALYSIS. 



FOR EACH CNODE WE FILL IN: 

CLOVARS THE SET OF VARIABLES REFERRED TO BY CLOSURES 

AT OR BELOW THIS NODE (SHOULD ALWAYS BE A 
SUBSET OF REFS) 
FOR EACH CLAMBDA AND CONTINUATION WE FILL IN: 

FNP NON-NIL IFF REFERENCED ONLY AS A FUNCTION. 

WILL BE 'EZCLOSE IF REFERRED TO BY A CLOSURE, 
AND OTHERWISE 'NOCLOSE. 
TVARS VARIABLES PASSEO THROUGH TEMP LOCATIONS WHEN CALLING 

THIS FUNCTION 
NAME THE NAME OF THE FUNCTION (USED FOR THE PROG TAG) 

FOR EACH CLABELS WE FILL IN: 

EASY REFLECTS FNP STATUS OF ALL THE LABELLED FUNCTIONS 

FOR EACH VARIABLE WHICH ALWAYS DENOTES A CERTAIN FUNCTION WE 
PUT THE PROPERTIES: 

KNOWN-FUNCTION IFF THE VARIABLE IS NEVER ASET 

THE VALUE OF THE KNOWN-FUNCTION PROPERTY IS THE CNODE FOR 
THE FUNCTION DEFINITION. 

FOR EACH LABELS VARIABLE IN A LABELS OF THE 'EZCLOSE VARIETY 
WE PUT THE PROPERTY: 
LABELS-FUNCTION 
TO INDICATE THAT ITS "EASY" CLOSURE MUST BE CDR'D TO GET THE 
CORRECT ENVIRONMENT (SEE PRODUCE-LABELS). 

;;; NAME, IF NON-NIL, IS A SUGGESTED NAME FOR THE FUNCTION 

(DEFINE BIND-ANALYZE 

(LAMBDA (CNODE FNP NAME) 

(LET ((CFM (CNODENCFORM CNODE))) 
(EQCASE (TYPE CFM) 
(TRIVIAL 

(ALTER-CNODE CNODE (CLOVARS := NIL))) 
(CVARIABLE 

(ALTER-CNODE CNODE (CLOVARS := NIL))) 
(CLAMBDA 

(BIND-ANALYZE-CLAMBDA CNODE FNP NAME CFM)) 
(CONTINUATION 

(BIND-ANALYZE.-CONTINUATION CNODE FNP NAME CFM)) 
(CIF 

(BIND-ANALYZE-CIF CNODE CFM)) 
(CASET 

(BIND-ANALYZE-CASET CNODE CFM)) 
(CLABELS 

(BIND-ANALYZE-CLABELS CNODE CFM)) 
( COMBINATION 

(BIND-ANALYZE-CCOMBINATION CNODE CFM)) 
(RETURN 

(BIND-ANALYZE-RETURN CNODE CFM)))))) 

(DEFINE REFD-VARS 

(LAMBDA (VARS) 

(DO ((V VARS (CDR V)) 

(W NIL (IF (OR (GET (CAR V) -READ-REFS) 

(GET (CAR V) 'WRITE-REFS)) 
(CONS (CAR V) W) 
W))) 
((NULL V) (NREVERSE W))))) 
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For a CLANBDA cnode, BIND-ANALYZE-CLAMBDA first analyzes the body. The 
CLOVARS component of the cnode is then calculated. If the CLAMBDA will have a 
run-time closure structure created for it, then any variable it references is 
obviously referred to by a closure. Otherwise, only the CLOVARS of its body are 
included in the set. 

The TVARS component is the set of parameters for which arguments will be 
passed in a non-standard manner. Non-standard argument-passing is used only for 
NOCLOSE-type functions (though in principle it could also be used for EZCLOSE- 
type functions also). In this case, only referenced variables (as determined by 
REFD-VARS) are actually passed. The code generator uses TVARS for two purposes: 
when compiling the CLAMBDA itself, TVARS is used to determine which arguments are 
in which registers; and when compiling calls to the function, TVARS determines 
which registers to load (see LAMBDACATE). 

The FNP slot is just filled in using the FNP parameter. If a name was 
not suggested for the NAME slot, an arbitrary name is generated. 

BIND-ANALYZE-CONTINUATION is entirely analogous to BIND-ANALYZE-CLAMBDA. 

BIND-ANALYZE-CIF straightforwardly analyzes recursively its sub-cnodes, 
and then passes the union of their CLOVARS up as its own CLOVARS. 

BIND-ANALYZE-CASET tries to be a little bit clever about the obscure case 
produced by code such as: 

(ASET 1 FOO (LAMBDA ...)) 

where the continuation is a CONTINUATION cnode (rather than a CVARIABLE). It is 
then known that the variable bound by the CONTINUATION (not the variable set by 
the CASETM) will have as its value the (closure of the) CLAMBDA-expression. 
This allows for the creation of a KNOWN -FUNCTION property, etc. This analysis is 
very similar to that performed by BIND-ANALYZE-RETURN (see below). Aside from 
this, the analysis of a CASET is simple; the CLOVARS component is merely the 
union of the CLOVAR slots of the sub-cnodes. 
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(DEFINE BIND-ANALYZE-CLAMBOA 

(LAMBDA (CNODE FNP NAME CFM) 

(BLOCK (BIND-ANALYZE (CLAMBDA\BODY CFM) NIL NIL) 
(ALTER-CNODE CNODE 

(CLOVARS := (IF (EQ FNP 'NOCLOSE) 

(CNODEXCLOVARS (CLAMBDA\BODY CFM)) 
(CNODEXREFS CNODE)))) 
(ALTER-CLAMBDA CFM 

(FNP := FNP) 

(TVARS := (IF (EQ FNP 'NOCLOSE) 

(REFD-VARS (CLAMBDAWARS CFM)) 
NIL)) 
(NAME := (OR NAME (GENTEMP 'F ))))))) 

(DEFINE BIND-ANALYZE-CONTINUATION 

(LAMBDA (CNODE FNP NAME CFM) 

(BLOCK (BIND-ANALYZE (CONTINUATION\BODY CFM) NIL NIL) 
(ALTER-CNODE CNODE 

(CLOVARS := (IF (EQ FNP 'NOCLOSE) 

(CNODEXCLOVARS (CONTINUATION\BODY CFM)) 
(CNODEXREFS CNODE)))) 
(ALTER-CONTINUATION CFM 

(FNP :«= FNP) 

(TVARS := (IF (EQ FNP 'NOCLOSE) 

(REFD-VARS (LIST (CONTINUATIONWAR CFM))) 
NIL)) 
(NAME := (OR NAME (GENTEMP 'C))))))) 

(DEFINE BIND-ANALYZE-CIF 

(LAMBDA (CNODE CFM) 

(BLOCK (BIND-ANALYZE (CIF\PRED CFM) NIL NIL) 
(BIND-ANALYZE (CIFXCON CFM) NIL NIL) 
(BIND-ANALYZE (CIF\ALT CFM) NIL NIL) 
(ALTER-CNODE CNODE 

(CLOVARS := (UNION (CNODEXCLOVARS (CIF\PRED CFM)) 

(UNION (CNODEXCLOVARS (CIFXCON CFM)) 

(CNOKXCLOVARS (CIFXALT CFM))))))))) 

(DEFINE BIND-ANALYZE-CASET 
(LAMBDA (CNOOE CFM) 

(LET ((CN (CASETXCONT CFM)) 

(VAL (CASETXBODY CFM))) 
(BIND-ANALYZE CN 'NOCLOSE NIL) 

(COND ((AND (EQ (TYPE (CNODEXCFORM CN)) 'CONTINUATION) 
(EQ (TYPE (CNODEXCFORM VAL)) 'CLAMBDA)) 
(LET ((VAR (CONTINUATIONWAR (CNODEXCFORM CN)))) 
(PUTPROP VAR VAL 'KNOWN-FUNCTION) 
(BIND-ANALYZE VAL 

(AND (NOT (GET VAR ' VARIABLE-REFP) ) 
(IF (MEMQ VAR 

(CNODEXCLOVARS 
(CONTINUATIONXBODY 
(CNODEXCFORM CN)))) 
'EZCLOSE 

(BLOCK (ALTER-CONTINUATION (CNODEXCFORM CN) 

(TVARS :« NIL)) 
'NOCLOSE))) 
NIL))) 
(T (BIND-ANALYZE VAL NIL NIL))) 
(ALTER-CNODE CNODE 

(CLOVARS := (UNION (CNODEXCLOVARS CN) 

(CNODEXCLOVARS VAL))))))) 
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The binding analysis of a CLABELS is very tricky because of the 
possibility of mutually referent functions. For example, suppose a single 
CLABELS binds two CLAMBDA expressions with names F00 and BAR. Suppose that the 
body of F00 refers to BAR, and that of BAR to F00. Should F00 and BAR be of FNP- 
type NIL, EZCLOSE, or NOCLOSE? If either is of type EZCLOSE, then the other must 
be also; but the decision cannot be made sequentially. It is even more 
complicated if one must be of type NIL. 

An approximate solution is used here, to prevent having to solve 
complicated simultaneous constraints. It is arbitrarily decreed that all 
functions of a single CLABELS shall all have the same FNP type. If any one must 
be of type NIL, then they all are. Otherwise, it is tentatively assumed that 
they all may be of type NOCLOSE. If this assumption is disproved, then the 
analysis is retroactively patched up. 

The outer DO loop of BIND-ANALYZE-CLABELS creates KNOWN- FUNCTION 
properties, and determines (in the variable EZ) whether any of the labelled 
functions needs a full closure structure. (This can be done before analyzing the 
functions, because it is determined entirely by the VARIABLE-REFP properties 
created in the previous phase.) The inner DO loop then analyzes the functions. 
When this is done, if EZ is NOCLOSE, and it turns out that it should have been 
EZCLOSE after all, then the third DO loop forcibly patches the CLAMBDA cnodes for 
the labelled functions, and the AMAPC form creates LABELS-FUNCTION properties as 
a flag for the code generator. 

BIND-ANALYZE-RETURN simply analyzes the continuation and return value 
recursively, and then merges to two CLOVARS sets to produce its own CLOVARS set. 
A special case is when the two sub-cnodes are respectively a CONTINUATION and a 
CLAMBDA; then special work is done because it is known that the variable bound 
by the CONTINUATION will always denote the (closure of the) CLAMBDA-expression. 
A nasty trick is that if it turns out that the CLAMBDA can be of type NOCLOSE, 
then the TVARS slot of the CONTINUATION is forcibly set to NIL (i.e. the empty 
set). This is because no argument will really be passed. (This fact is also 
known by the LAMBDACATE routine in the code generator.) 
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002 (DEFINE BIND-ANALYZE-CLABELS 

003 (LAMBDA (CNODE CFM) 

004 (BLOCK (BIND-ANALYZE (CLABELS\BODY CFM) NIL NIL) 

005 (DO ((V (CLABELS\FNVARS CFM) (COR V)) 

006 (D (CLABELSXFNDEFS CFM) (CDR D)) 

007 (EZ 'NOCLOSE (AND (NULL (GET (CAR V) 'VARIABLE-REFP) ) EZ))) 

008 ((NULL V) 

009 (ALTER-CLABELS CFM (EASY := EZ)) 

010 (DO ((V (CLABELSXFNVARS CFM) (CDR V)) 
Oil (D (CLABELSXFNDFJFS CFM) (CDR D)j 

012 (CV (CNODEXCLOVARS (CLABELS\BODY CFM)) 

013 (UNION CV (CNODEXCLOVARS (CAR D))))) 

014 ((NULL D) 

015 (ALTER-CNODE CNODE (CLOVARS := CV)) 

016 (COND ((AND EZ (INTERSECT CV (LABELSXFNVARS CFM))) 

017 (DO ((D (CLABELSXFNDEFS CFM) (CDR D)) 

018 (CV (CNODEXCLOVARS (CLABELSXBODY CFM)) 

019 (UNION CV (CNODEXCLOVARS (CAR D))))) 

020 ((NULL D) 

021 (ALTER-CNODE CNODE (CLOVARS := CV))) 

022 (ALTER-CLAMBDA (CNODEXCFORM (CAR D)) 

023 (FNP := 'EZCLOSE) 

024 (TVARS := NIL)) 

025 (ALTER-CNODE (CAR D) 

026 (CLOVARS := (CNODEXREFS (CAR D))))) 

027 (AMAPC (LAMBDA (V) (PUTPROP V T 'LABELS-FUNCTION)) 

028 (CLABELSXFNVARS CFM)) 

029 (ALTER-CLABELS CFM (EASY := 'EZCLOSE))))) 

030 (BIND-ANALYZE (CAR D) EZ (CAR V)))) 

031 (PUTPROP (CAR V) (CAR D) 'KNOWN-FUNCTION))))) 
032 

033 (DEFINE BIND-ANALYZE-RETURN 

034 (LAMBDA (CNODE CFM) 

035 (LET ((CN (RETURNXCONT CFM)) 

036 (VAL (RETURNXVAL CFM))) 

037 (BIND-ANALYZE CN 'NOCLOSE NIL) 

038 (COND ((AND (EQ (TYPE (CNODEXCFORM CN)) 'CONTINUATION) 

039 (EQ (TYPE (CNODEXCFORM VAL)) 'CLAMBDAi)) 

040 (LET ((VAR (CONTINUATIONXVAR (CNODEXCFORM CN) )) ) 

041 (PUTPROP VAR VAL 'KNOWN-FUNCTION) 

042 (BIND-ANALYZE VAL 

043 (AND (NOT (GET VAR 'VARIABLE-REFP)) 

044 (IF (MEMO VAR 

045 (CNODEXCLOVARS 

046 (CONTINUATIONXBODY 
04 7 (CNODEXCFORM CN)))) 

048 'EZCLOSE 

049 (BLOCK (ALTER-CONTINUATION (CNODEXCFORM CN) 

050 (TVARS := NIL)) 

051 'NOCLOSE))) 

052 NIL))) 

053 (T (BIND-ANALYZE VAL NIL NIL))) 

054 (ALTER-CNODE CNODE 

055 (CLOVARS := (UNION (CNODEXCLOVARS CN) 

056 (CNODEXCLOVARS VAL))))))) 
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BIND-ANALYZE-CCONBINATION first analyzes the function position of the 
combination. It then distinguishes three cases: a trivial function, a CLAMBDA- 
expression function, and all others. 

In the case of a trivial function, the continuation (which is the second 
item in ARGS) can be analyzed with FNP = NOCLOSE, because the compilation will 
essentially turn into "calculate all other arguments, apply the trivial function, 
and then give the result to the continuation". A COMBINATION which looks like: 

(a-trivial-function (CONTINUATION (var) ...) argl ... argn) 
is compiled almost as if it were: 

((CONTINUATION (var) ...) (a-trivial-function argl ... argn)) 

and of course the continuation can be treated as of type NOCLOSE. 

In the case of a CLANBDA-expression, the arguments are all analyzed, and 
then the AMAPC expression goes back over the TVARS list of the CLAMBDA, and 
removes from the TVARS set each variable corresponding to an argument which the 
analysis has proved to be a NOCLOSE-type KNOWN -FUNCTION. This is because no 
actual argument will be passed at run time for such a function, and so there is 
no need to allocate a register through which to pass that argument. 

In the third case, the arguments are analyzed straightforwardly by BIND- 
CCOMBINATION-ANALYZE. 

BIND-CCONBINATION-ANALYZE does the dirty work of analyzing arguments of a 
CCOMBINATION and updating the CLOVARS slot of the COMBINATION cnode. If VARS is 
non-NIL, then it is the variables of the CLANBDA-expression which was in the 
function position of the CCOMBINATION. As the arguments are analyzed, KNOWN- 
FUNCTION properties are put on the variables as appropriate, and the correct 
value of FNP is determined for the recursive call to BIND-ANALYZE. If VARS is 
NIL, then this code depends on the fact that (CDR NIL)=NIL in NacLISP. 
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002 (DEFINE BIND-ANALYZE-CCOMBINATION 

003 (LAMBDA (CNODE CFM) 

004 (LET ((ARGS (CCOMBINATION\ARGS CFM))) 

005 (BIND-ANALYZE (CAR ARGS) 'NOCLOSE NIL) 

006 (LET ((FN (CNODE\CFORM (CAR ARGS)))) 

007 (COND ((AND (EQ (TYPE FN) 'TRIVIAL) 

008 (EQ (TYPE (NODE\FORM (TRIVIALNNODE FN))) 

009 'VARIABLE) 

010 (TRIVFN (VARIABLEWAR (NODE\FORM (TRIVIAL\NODE FN))))) 
Oil (BIND-ANALYZE (CADR ARGS) 'NOCLOSE NIL) 

012 (BIND-CCOMBINATION-ANALYZE CNODE 

013 (CDDR ARGS) 

014 NIL 

015 (CNODEXCLOVARS (CADR ARGS)))) 

016 ((EQ (TYPE FN) 'CLAMBDA) 

017 (BIND-CCOMBINATION-ANALYZE CNODE 

018 (CDR ARGS) 

019 (CLAMBDAWARS FN) 

020 (CNODENCLOVARS (CAR ARGS))) 

021 (AMAPC (LAMBDA (V) 

022 (IF (LET ((KFN (GET V 'KNOWN -FUNCT ION) ) ) 

023 (AND KFN 

024 (EQ (EQCASE (TYPE (CNODEXCFORM KFN)) 

025 (CLAMBDA 

026 (CLAMBDA\FNP 

027 (CNODE\CFORM KFN))) 

028 (CONTINUATION 

029 (CONTINUATIONNFNP 

030 (CNODENCFORM KFN)))) 

031 'NOCLOSE))) 

032 (ALTER-CLAMBDA 

033 FN 

34 (TVARS :» (DELQ V (CLAMBDA\TVARS FN)))))) 

035 (CLAMBDAYTVARS FN))) 

036 (T (BIND-CCOMBINATION-ANALYZE CNODE 

037 (CDR ARGS) 

038 NIL 

039 (CNODE\CLOVARS (CAR ARGS))))))))) 
040 

041 ;;; VARS MAY BE NIL - WE DEPEND ON (CDR NIL)=NIL. 
042 

043 (DEFINE BIND-CCOMBINATION-ANALYZE 

044 (LAMBDA (CNODE ARGS VARS FCV) 

045 (DO ((A ARGS (CDR A)) 

046 (V VARS (CDR V)) 

047 (CV FCV (UNION CV (CNODE\CLOVARS (CAR A))))) 

048 ((NULL A) 

049 (ALTER-CNODE CNODE (CLOVARS := CV))) 

050 (COND ((AND VARS 

051 (MEMQ (TYPE (CNODENCFORM (CAR A))) '(CLAMBDA CONTINUATION)) 

052 (NOT (GET (CAR V) ' WRITE-REFS) )) 

053 (PUTPROP (CAR V) (CAR A) 'KNOWN-FUNCTION) 

054 (BIND-ANALYZE (CAR A) 

055 (AND (NOT (GET (CAR V) ' VARIABLE-REFP) ) 

056 (IF (MEMQ (CAR V) FCV) 

057 'EZCLOSE 

058 'NOCLOSE)) 

059 NIL)) 

060 (T (BIND-ANALYZE (CAR A) NIL NIL)))))) 
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DEPTH-ANALYZE allocates registers through which to pass arguments to 
NOCLOSE functions, i.e. for arguments corresponding to elements of TVARS sets. 
An unclever stack discipline is used for allocating registers. Each function is 
assigned a "depth", which is zero for a function whose FNP is NIL or EZCLOSE 
(such functions take their arguments in the standard registers **ONE** through 
**EIGHT**, assuming that **NUMBER-OF-ARG-REGS** is 8, as it is in the current 
SCHEME implementation). For a NOCLOSE function the depth is essentially the 
depth of the function in whose body the NOCLOSE function appears, plus the number 
of TVARS belonging to that other function (if it is of type NOCLOSE) or the 
number of standard argument registers used by it (if it is NIL or EZCLOSE). For 
example, consider this code: 

(CLAMBDA (C X Y) 

((CLAMBDA (K F Z) 

((CLAMBDA (Q W V) 

...) 
CONT-57 '3 «4)) 
(CONTINUATION (V) ...) 
(CLAMBDA (H) ...) 
TOO)) 

Suppose that the outer CLAMBDA is of type EZCLOSE for some reason. Its depth is 
0. The two CLAMBDA-expressions and CONTINUATION immediately within it have depth 
3 (assuming the CONTINUATION and second CLAMBDA are of type NOCLOSE -- the first 
CLAMBDA definitely is). The innermost CLAMBDA is then of depth 4 (for Z, which 
will be in TVARS -- K and F will not be because they are names for NOCLOSE 
functions, assuming K and F have no WRITE-REFS properties). 

To each function is also attached a MAXDEP value, which is in effect the 
number of registers used by that function, including all NOCLOSE functions within 
it. This is used in only one place in the code generator, to generate a SPECIAL 
declaration for the benefit of the MacLISP compiler, which compiles the output of 
RABBIT. For most constructs this is simply the numerical maximum over the depths 
of all sub-cnodes. Toward this end the maximum depth of the cnode is returned as 
the value of DEPTH-ANALYZE. 



FOR EACH CLAMBDA AND CONTINUATION WE FILL IN: 

DEP OEPTH OF TEMP VAR USAGE AT THIS POINT 

MAXDEP MAX DEPTH BELOW THIS POINT 
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002 ;;; DEPTH ANALYSIS FOR CPS VERSION. 
003 
004 
005 
006 
007 

008 ; ; ; VALUE OF DEPTH-ANALYZE IS THE MAX DEPTH 

009 

010 (DEFINE DEPTH-ANALYZE 

Oil (LAMBDA (CNODE DEP) 

012 (LET ((CFM (CNODE\CFORM CNODE))) 

013 (EQCASE (TYPE CFM) 

014 (TRIVIAL DEP) 

015 (CVARIABLE DEP) 

016 (CLAMBDA 

017 (LET ((MD (DEPTH-ANALYZE (CLAMBDA\BODY CFM) 

018 (IF (EQ (CLAMBDA\FNP CFM) 'NOCLOSE) 

019 (+ DEP (LENGTH (CLAMBDA\TVARS CFM))) 

020 (MIN (LENGTH (CLAMBDAWARS CFM)) 

021 (+ 1 **NUMBER-OF-ARG-REGS**)))))) 

022 (ALTER-CLAMBDA 

023 CFM 

024 (DEP := (IF (EQ (CLAMBDA\FNP CFM) 'NOCLOSE) DEP 0)) 

025 (MAXDEP := MD)) 

026 MD)) 

027 (CONTINUATION 

028 (LET ((MD (DEPTH-ANALYZE 

029 (CONTINUATION\BODY CFM) 

030 (IF (EQ (CONTINUATION\FNP CFM) 'NOCLOSE) 

031 (+ DEP (LENGTH (CONTINUATION\TVARS CFM))) 

032 2)))) 

033 (ALTER-CONTINUATION 

034 CFM 

035 (DEP := (IF (EQ (CONTINUATK)N\FNP CFM) 'NOCLOSE) DEP 0)) 

036 (MAXDEP := MD)) 

037 MD)) 

038 (CIF 

039 (MAX (DEPTH-ANALYZE (CIF\PRED CFM) DEP) 

040 (DEPTH-ANALYZE (CIF\CON CFM) DEP) 

041 (DEPTH-ANALYZE (CIF\ALT CFM) DEP))) 

042 (CASET 

043 (MAX (DEPTH-ANALYZE (CASET\CONT CFM) DEP) 

044 (DEPTH-ANALYZE (CASET\BODY CFM) DEP))) 

045 (CLABELS 

046 (LET ((DP (IF (EQ (CLABELS\EASY CFM) 'NOCLOSE) 

047 DEP 

048 (+ DEP (LENGTH (CLABELS\FNVARS CFM)))))) 

049 (DO ((D (CLABELSNFNDEFS CFM) (CDR D)) 

050 (MD (DEPTH-ANALYZE (CLABELSNBODY CFM) DP) 

051 (MAX MD (DEPTH-ANALYZE (CAR D) DP)))) 

052 ((NULL D) MD)))) 

053 ^COMBINATION 

054 (DO ((A (CCOMBINATIONNARGS CFM) (COR A)) 

055 (MD (MAX MD (DEPTH-ANALYZE (CAR A) DEP)))) 

056 ((NULL A) MD))) 

057 (RETURN 

058 (MAX (DEPTH-ANALYZE (RETURN\CONT CFM) DEP) 

059 (DEPTH-ANALYZE (RETURNWAL CFM) DEP))))))) 
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Just as DEPTH-ANALYZE assigns locations in registers ("stack locations") 
for variables, so CLOSE-ANALYZE assigns locations in consed ("heap-allocated") 
environment structures for variables. The general idea is that if the value of a 
an accessible variable is not in a register, then it is in the structure which is 
in the register **ENV**. This structure can in principle be any structure 
whatsoever, according to the whim of the compiler. RABBIT'S whim is to be very 
unclever; the structure of **ENV** is always a simple list of variable values. 
Thus a variable in the **ENV** structure is always accessed by a series of CDR 
operations and then one CAR operation. 

(More clever would be to maintain the environment as a chained list of 
vectors, each vector representing a non-null contour. Then a variable could be 
accessed by a series of "CDR" operations equal to the number of contours (rather 
than the number of variables) between the binding and the reference, followed by 
a single indexing operation into the contour-vector. The number of "CDR" 
operations could be reduced by having a kind of "cache" for the results of such 
contour operations; such a cache would in fact be equivalent to the "display" 
used in many Algol implementations. If such a display were maintained, a 
variable could be accessed simply by a two-level indexing operation.) 

Within the compiler an environment structure is also represented as a 
simple list, with the name of a variable occupying the position which its value 
will occupy in the run-time environment. 

For every CLAMBDA, CONTINUATION, and CLABELS, a slot called CONSENV is 
filled in, which is a list representing what the environment structure will look 
like when the closure(s) for that construct are to be constructed, if any This 
is done by walking over the cnode-tree and doing to the environment 
representation precisely what will be done to the real environment at run time. 

There is a problem with the possibility that a variable may initially be 
in a register (because it was passed as an argument, for example), but must be 
transferred to a consed environment structure because the variable is referred to 
by the code of a closure to be constructed. There are two cases: either the 
variable has no WRITE-REFS property, or it does. 

If it does not, then there is no problem with the value of the variable 
being in two or more places, so it is simply copied and consed into the 
environment as necessary. The CLOSEREFS slot of a function is a list of such 
variables which must be added to the consed environment before constructinq the 
closure. 

If the variable does have WRITE-REFS, then the value of the variable must 
have a single "home", to prevent inconsistencies when it is altered (This is 
far easier than arranging for every ASET' operation to update all extant copies 
of a variable's value.) It is arranged that such variables, if they are referred 
to be closures (are in the CLOVARS set of the CLANBDA which binds them) will 
exist only in the consed environment. Thus for each CLAMBDA the ASETVARS set is 
that subset of the lambda variables which have WRITE-REFS and are in the CLOVARS 
set. Before the body of the CLANBDA is executed, a piece of code inserted by the 
code generator will transfer the variables from their registers immediately into 
the consed environment, and the values in the registers are thereafter never 
referred to. 
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CLOSURE ANALYSIS FOR CPS VERSION 



FOR EACH CLAMBDA, CONTINUATION, AND CLABELS WE FILL IN: 

CONSENV THE CONSED ENVIRONMENT OF THE CLAMBDA. 
CONTINUATION, OR CLABELS (BEFORE ANY 
CLOSEREFS HAVE BEEN CONSED ON) 

FOR EACH CLAMBDA AND CONTINUATION WE FILL IN: 

CLOSEREFS A LIST OF VARIABLES REFERENCED BY THE CLAMBDA 
OR CONTINUATION WHICH ARE NOT IN THE CONSED 
ENVIRONMENT AT THE POINT OF THE CLAMBDA OR 
CONTINUATION AND SO MUST BE CONSED ONTO THE 
ENVIRONMENT AT CLOSURE TIME; HOWEVER, THESE 
NEED NOT BE CONSED ON IF THE CLAMBDA OR 
CONTINUATION IS IN FUNCTION POSITION OF 
A FATHER WHICH IS A COMBINATION OR RETURN 

FOR THE CLAMBDA' S IN THE FNDEFS OF A CLABELS, THESE MAY BE 

SLIGHTLY ARTIFICIAL FOR THE SAKE OF OPTIMIZATION (SEE BELOW). 

FOR EACH CLAMBDA WE FILL IN: 

ASETVARS A LIST OF THE VARIABLES BOUND IN THE CLAMBDA 
WHICH ARE EVER ASET AND SO MUST BE CONSED 
ONTO THE ENVIRONMENT IMMEDIATELY IF ANY 
CLOSURES OCCUR IN THE BODY 

FOR EACH CLABELS WE FILL IN: 

VARIABLES TO BE CONSED ONTO THE CURRENT CONSENV 
BEFORE CLOSING THE LABELS FUNCTIONS 



FNENV 



;;; CENV IS THE CONSED ENVIRONMENT (A LIST OF VARIABLES) 

(DEFINE FILTER-CLOSEREFS 

(LAMBDA (REFS CENV) 

(DO ((X REFS (CDR X)) 
(Y NIL 

(IF (OR (MEMO (CAR X) CENV) 

(LET ((KFN (GET (CAR X) 'KNOWN-FUNCTION))) 
(AND KFN 

(EO (EOCASE (TYPE (CNODE\CFORM KFN)) 
(CLAMBDA 

(CLAMBDA\FNP (CNODENCFORM KFN))) 
(CONTINUATION 
(CONTINUATION\FNP (CNODEXCFORM KFN)))) 
•NOCLOSE)))) 
Y 

(CONS (CAR X) Y)))) 
((NULL X) (NREVERSE Y))))) 
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For each CLABELS a set called FNENV is computed. This is strictly an 
efficiency hack, which attempts to arrange it such that the several closures 
constructed for a CLABELS share environment structure. The union over all the 
variables needed is computed, and these variables are, at run time, all consed 
onto the environment before any of the closures is constructed. The hope is that 
the intersection of these sets is large, so that the total environment consing is 
less than if a separate environment were consed for each labelled closure. 

FILTER-CLOSEREF5 is a utility routine which, given a set of variables and 
an environment representation, returns that subset of the variables which are not 
already in the environment and so do not denote known NOCLOSE functions. (Those 
variables which are already in the consed environment or which do denote NOCLOSE 
functions of course need not be added to that consed environment.) 

The argument CENV to CLOSE-ANALYZE is the representation of the consed 
environment (in **ENV**) which will be present when the code for CNODE is 
executed. The only processing of interest occurs for CLAMBDA, CONTINUATION, and 
CLABELS cnodes. 

The CLOSEREFS of a CLAMBDA are those which are referred to by the CLAMBDA 
and which are not already in CENV, provided the CLAMBDA is not of type NOCLOSE. 
The ASETVARS are precisely those VARS which have WRITE-REFS and are in CLOVARS. 

The processing for a CONTINUATION is similar. As a consistency check, we 
make sure the bound variable has no WRITE-REFS (it should be impossible for an 
ASET' to refer to the bound variable of a CONTINUATION). 

For a CLABELS, the FNENV set is first calculated and added to CENV. This 
new CENV is then used to process the definitions and body of the CLABELS. 
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002 (DEFINE CLOSE-ANALYZE 

003 (LAMBDA (CNODE CENV) 

004 (LET ((CFM (CNODE\CFORM CNODE))) 

005 (EQCASE (TYPE CFM) 

006 (TRIVIAL NIL) 

007 (CVARIABLE NIL) 

008 (CLAMBDA 

009 (LET ((CR (AND (NOT (EQ (CLAMBDA\FNP CFM) 'NOCLOSE)) 

010 (FILTER-CLOSEREFS (CNODE\REFS CNODE) CENV))) 
Oil (AV (DO ((V (CLAMBDAWARS (CNODE\CFORM CNODE)) (CDR V)) 

012 (A NIL (IF (AND (GET (CAR V) 'WRITE-REFS) 

013 (MEMO (CAR V) 

014 ( CNODE XCLOVARS 

015 (CLAMBDAXBODY CFM)))) 

016 (CONS (CAR V) A) 

017 A))) 

018 ((NULL V) A)))) 

019 (ALTER-CLAMBDA CFM 

020 (CONSENV := CENV) 

021 (CLOSEREFS := CR) 
22 (ASETVARS := AV)) 

023 (CLOSE -ANALYZE (CLAMBDAXBODY CFM) 

024 (APPEND AV CR CENV)))) 

025 (CONTINUATION 

026 (AND (GET (CONTINUATIONWAR CFM) 'WRITE-REFS) 

027 (ERROR ' |How could an ASET refer to a continuation variable?! 

028 CNODE 

029 'FAIL-ACT)) 

030 (LET ((CR (AND (NOT (EQ (CONTINUATION\FNP CFM) 'NOCLOSE)) 

031 (FILTER-CLOSEREFS (CNODE\REFS CNODE) CENV)))) 

032 (ALTER-CONTINUATION CFM 

033 (CONSENV := CENV) 

034 (CLOSEREFS := CR)) 

035 (CLOSE-ANALYZE (CONTINUATION\BODY CFM) 

036 (APPEND CR CENV)))) 

037 (CIF 

038 (CLOSE-ANALYZE (CIF\PRED CFM) CENV) 

039 (CLOSE -ANALYZE (CIF\CON CFM) CENV) 

040 (CLOSE -ANALYZE (CIF\ALT CFM) CENV)) 

041 (CASET 

042 (CLOSE-ANALYZE (CASET\CONT CFM) CENV) 

043 (CLOSE-ANALYZE (CASETXBODY CFM) CENV)) 

044 (CLABELS 

045 ((LAMBDA (CENV) 

046 (BLOCK (AMAPC (LAMBDA (D) (CLOSE -ANALYZE D CENV)) 

047 (CLABELSXFNDEFS CFM)) 

048 (CLOSE-ANALYZE (CLABELSXBODY CFM) CENV))) 

049 (COND ((CLABELSXEASY CFM) 

050 (DO ((D (CLABELSXFNDEFS CFM) (COR D)) 

051 (R NIL (UNION R (CNODEXREFS (CAR D))))) 

052 ((NULL D) 

053 (LET ((E (FILTER-CLOSEREFS R CENV))) 

054 (ALTER-CLABELS CFM 

055 (FNENV := E) 

056 (CONSENV := CENV)) 

057 (APPEND E CENV))))) 

058 (T (ALTER-CLABELS CFM 

059 (FNENV :■ NIL) 

060 (CONSENV := CENV)) 

061 CENV)))) 

062 ^COMBINATION 

063 (AMAPC (LAMBDA (A) (CLOSE -ANALYZE A CENV)) 

064 (CCOMBINATIONXARGS CFM))) 

065 (RETURN 

066 (CLOSE -ANALYZE (RETURNXCONT CFM) CENV) 

067 (CLOSE -ANALYZE (RETURNXVAL CFM) CENV)))))) 



206 



of »n thl „ com \ to the code generator, which is altogether about one-fourth 
?L ♦ ,, de mak1 " 9 up RABBIT - Part of ^is is because much code which is 
conceptually singular is duplicated in several places (partly as a result of thP 

no e des n e are Or tre n at:d 1C di.^ C0M n NATI0N rf an ' RE ™ N " 0deS ' °' ""nBDA and CONtUaTION 
veryUsy to mak fi r„„i» r y i ^d also because a powerful text editor made it 
very easy to make copies of the code for various purposes!). The rest is iust 
because code generation is fairly tricky and requires checking for 'special cases 
A certain amount of peephole optimization is performed- this is not « .,1 ! ?A 
improve the efficiency of the output code, M \o .2. in, outpet cooe ..".? to 
ten tn t a i Uma " debu 9 gin 9 RABBIT - A large fraction of the output code (perhaps 

^™Z£^&Z ^t'ZgX' various — — d - & E 

to return"^^" fhe S3. =£d WT^Z c^e^e "^^ &*„? 
functions encountered in the cnode-tree, for which code is* to oeneratprt 
separately later. We solve this problem by a stylistic t"ck nameltthP 
explicit use of continuation-passing style. Hany functions in the .code generator 
take an argument named »C». This argument is itself a function of two arGument^ 
the generated code and the deferred-function list. The function which ?s i r 
is expected to compute its two results and then invoke "giving it th. JwS 
results as arguments. (In practice a function which gets an argument Cahoot 
an argument FNS, which is a deferred-functions list; the funcUon is exPecced to 

"along withThe general ZlT* "" ""»' "* ^ ** "-^ S "" *° 

CENV (a ^s^T ot^ne^m^nS "tE^E "'lisToescr^ng 
external functions compiled together in this "block" or "module" (this ?!„"£ ?n 
compile a direct GOTO rather than a more expensive call to an external function 

moduli w^Ttnf UerUSP ^-coeSf^ ST^*" * "'"^ 

and so any piece of code which constructs a closure will need to refer to thi 
value of this symbol); and RNL, the "rename list" an ,Hst »!„„<„, ? 

triable IT?* h° Pie " S ° f COde f ° r -«*•«"»«*"•■ (when ^/"o^referenTe^a 
b " t0 be generated, the piece of code in RNL is used if the variable is 

tTere'for-e'gToba^riro^ut 5 , 6 . 9 refere " Ce t0 "» Vari3ble — ™KX£: 11 

troo - C0NP JLATE is the topmost routine of the code generator. FN is the cnode- 

type ^z^r^^^ stirs ^tf^?^? z 1< 

initial RNL < rename list) for references to the arguments Also when COnP-BoS? 
has returned the code (the innermost LAHBDA-expression in COMPII at • i« ?? 
argument C given to COUP-BODY). SET-UP-ASETVARS i 'calle to ?ake care of CQDVln. 
TONTINUATlor ln * h ", ASETVARS S6t int0 the consed environment The code for I 
bound var able S1, " lla^ • "^ ""' 8 ™™™™ has no ASETVARS and only one 
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COQE GENERATION ROUTINES 



PROGNAME: NAME OF A VARIABLE WHICH AT RUN TIME WILL HAVE 

AS VALUE THE SUBR POINTER FOR THE PROG 
FN: THE FUNCTION TO COMPILE (A CLAMBDA OR CONTINUATION CNOOE) 

EXTERNALP: NON-NIL IF THE FUNCTION IS EXTERNAL 
RNL: INITIAL RENAME LIST (NON-NIL ONLY FOR NOCLOSE FNS). 

ENTRIES ARE: (VAR . CODE) 
BLOCKFNS: AN ALIST OF FUNCTIONS IN THIS BLOCK. 

ENTRIES ARE: (USERNAME CNOOE) 
FNS: A LIST OF TUPLES FOR FUNCTIONS YET TO BE COMPILED; 

EACH TUPLE IS (PROGNAME FN RNL) 
C: A CONTINUATION, TAKING: 

CODE: THE PIECE OF MACLISP CODE FOR THE FUNCTION 

FNS: AN AUGMENTED FNS LIST 

(DEFINE COMPILATE 

(LAMBDA (PROGNAME FN RNL BLOCKFNS FNS C) 
(LET ((CFM (CNODENCFORM FN))) 
(EOCASE (TYPE CFM) 
(CLAMBDA 
(LET ((CENV (APPEND (CLAMBDA\ASETVARS CFM) 
(CLAMBDA\CLOSEREFS CFM) 
(CLAMBDA\CONSENV CFM)))) 
(COMP-BOOY (CLAMBDA\BODY CFM) 

(REGSLIST CFM T (ENVCARCDR CENV RNL)) 

PROGNAME 

BLOCKFNS 

CENV 

FNS 

(LAMBDA (CODE FNS) 

(C (SET-UP-ASETVARS CODE 

(CLAMBDANASETVARS CFM) 
(REGSLIST CFM NIL NIL)) 
FNS))))) 
(CONTINUATION 
(LET ((CENV (APPEND (CONTINUATION\CLOSEREFS CFM) 
(CONTINUATIONXCONSENV CFM)))) 
(COMP-BODY (CONTINUATION\BODY CFM) 

(IF (EQ (CONTINUATIONNFNP CFM) 'NOCLOSE) 
(IF (NULL (CONTINUATIONXTVARS CFM)) 
(ENVCARCDR CENV RNL) 
(CONS (CONS (CONTINUATIONWAR CFM) 

(TEMPLOC (CONTINUATIONXDEP CFM))) 
(ENVCARCDR CENV RNL))) 
(CONS (CONS (CONTINUATIONWAR CFM) 

(CAR **ARGUMENT-REGISTERS**)) 



(ENVCARCDR CENV RNL))) 



PROGNAME 

BLOCKFNS 

CENV 

FNS 

C))))))) 
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**ARGUNENT-REGISTERS** is a list of the standard "registers" through 
which arguments are passed. In the standard SCHEME implementation this list is: 

(**ONE** **TWO** **THREE** **FOUR** 

**FIVE** **SIX** **SEVEN** **EIGHT**) 

DEPROGNIFY1 is a peephole optimizer. It takes a MacLISP form and returns 
a list of NacLISP forms. The idea is that if the given form is (PROGN . ..), the 
keyword PROGN is stripped off; also, any irrelevant computations (references to 
variables or constants other than in the final position) are removed. 
(ATONFLUSHP, when NIL, suppresses the removal of symbols, which in some cases may 
be MacLISP PROG tags). The purpose of this is to avoid multiple nesting of PROGN 
forms: 

(PROGN (PROGN a b) (PROGN (PROGN c (PROGN d e) f) g)) 

Any code generation routine which constructs a PROGN with a component Q generated 
by another routine generally says: 

"(PROGN (SETQ FOO 3) @(DEPROGNIFY Q) (GO ,THE-TAG)) 

The "I?" means that the list of forms returned by the call to DEPROGNIFY (which is 
actually a macro which expands into a call to DEPROGNIFY1) is to be substituted 
into the list (PROGN....) being constructed by the "" operator. Thus rather 
than the nested PROGN code shown above, the code generator would instead produce: 

(PROGN a b c d e f g) 
which is much easier to read when debugging the output of RABBIT. 

TEMPLOC is a little utility which given the number (in the DEP ordering 
used by DEPTH-ANALYZE) of a register returns the name of that register. 
**CONT+ARG-REGS** is the same as **ARGUMENT-REGISTERS** except that the name 
**CONT** is tacked onto the front. **CONT** is considered to be register 0. If N 
is greater than the number of the highest standard argument register, then a new 
register name of the form "-N-" is invented. Thus the additional temporary 
registers are called -11-, -12-, -13-, etc. 
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002 ;;; DEPROGNIFY IS USED ONLY TO MAKE THE OUTPUT PRETTY BY ELIMINATING 

003 ;;; UNNECESSARY OCCURRENCES OF "PROGN". 
004 

005 (OEFMAC DEPROGNIFY (FORM) "(DEPROGNIFY1 .FORM NIL)) 

006 

007 (SET' *DEPROGNIFY-COUNT* 0) 

008 

009 (DEFINE DEPROGNIFY1 

010 (LAMBDA (FORM ATOMFLUSHP) 

Oil (IF (OR (ATOM FORM) (NOT (EQ (CAR FORM) 'PROGN))) 

012 (LIST FORM) 

013 (DO ((X (CDR FORM) (CDR X)) 

014 (Z NIL (COND ((NULL (CDR X)) (CONS (CAR X) Z)) 

015 ((NULL (CAR X)) 

016 (INCREMENT *DEPROGNIFY-COUNT*) 

017 Z) 

018 ((ATOM (CAR X)) 

019 (COND (ATOMFLUSHP 

020 (INCREMENT *DEPROGNIFY-COUNT* ) 

021 Z) 

022 (T (CONS (CAR X) Z)))) 

023 ((EQ (CAAR X) 'QUOTE) 

024 (INCREMENT *DEPROGNIFY-COUNT*) 

025 Z) 

026 (T (CONS (CAR X) Z))))) 

027 ((NULL X) (NREVERSE Z)))))) 
028 

029 (DEFINE 
030 
031 
032 
033 

034 (IMPLODE (APPEND '(-) (EXPLOOEN N) '(-))) 

035 
036 

037 (LOOP (CDR REGS) (- J 1))))))) 

038 



TEMPLOC 






(LAMBDA (N) 






(LABELS 


( ( LOOP 






(LAMBDA (REGS J) 






(IF (NULL REGS) 






(IMPLODE (APPEND '(- 






(IF (= J 0) 






(CAR REGS) ' 






(LOOP (CDR REGS) 




(LOOP **CONT+ARG-REGS** N)))) 
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ENVCARCDR takes a set of variables VARS representing the consed 
environment, and an old rename list RNL, and adds to RNL new entries for the 
variables, supplying pieces of code to access the environment structure. For 
example, suppose RNL were NIL, and VARS were (A B C). Then ENVCARCDR would 
produce the list: 

((C . (CAR (CDR (CDR **ENV**)))) 
(B . (CAR (CDR **ENV**))) 
(A . (CAR **ENV**))) 

where each variable has been paired with a little piece of code which can be used 
to access it at run time. This example is not quite correct, however, because 
the peephole optimizer DECARCDRATE is called on the little pieces of code; 
DECARCDRATE collapses CAR-CDR chains to make them easier to read, and so the true 
result of ENVCARCDR would be: 

((C . (CADDR **ENV**)) 
(B . (CADR **ENV**)) 
(A . (CAR **ENV**))) 
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002 (DEFINE ENVCARCDR 

003 (LAMBDA (VARS RNL) 

004 (DO ((X '**ENV** "(CDR ,X)) 

005 (V VARS (CDR V)) 

006 (R RNL (CONS (CONS (CAR V) (DECARCORATE "(CAR ,X))) R))) 

007 ((NULL V) R)))) 
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REGSLIST takes a CLAMBDA cnode, a switch AVP, and a rename list RNL. It 
tacks onto RNL new entries which describe how to access the arguments of the 
CLAMBDA. This is complicated because there are three cases. (1) A NOCLOSE 
function takes its arguments in non-standard registers. (2) Other functions of 
not more than **NUNBER-OF-ARGUMENT-REGISTERS** (the length of the **ARGUMENT- 
REGISTERS** list) arguments takes their arguments in the standard registers. (3) 
All other functions takes a list of arguments in the first argument register 
(**ONE**), except for the continuation in **CONT**. The switch AVP tells whether 
or not the elements of A5ETVARS should be included (non-nil means do not 
include). 

As an example, suppose the CLAMBDA is a NOCLOSE with DEP = 12 and TVARS = 
(A B C D), and suppose that AVP = T and RNL = NIL. Then the result would be: 

((D . -15-) (C . -14-) (B . -13-) (A . -12-)) 

As another example, suppose the CLAMBDA is of type EZCLOSE with VARS = (U Y Z) 
and ASETVARS = (Y), and suppose that AVP = NIL and RNL = ((A . -12-)). Then the 
result would be: 



((Z . **THREE**) (X . **ONE**) (K . **CONT**) (A . -12-)) 



SET-UP-ASETVARS takes a piece of code (the code for a CLAMBDA body), an 
ASETVARS set AV, and a rename list. If there are no ASETVARS, then just the code 
is returned, but otherwise a PROGN-form is returned, which ahead of the code has 
a SETQ which adds the ASETVARS to the environment. (LOOKUPICATE takes a variable 
and a RNL and returns a piece of code for referring to that variable.) For 
example, suppose we had: 

CODE = (GO FOO) 

AV = (A C) 

RNL = ((C . -14-) (B . -13-) (A . -12-)) 

Then SET-UP-ASETVARS would return the code: 

(PROGN (SETQ **ENV** (CONS -12- (CONS -14- **ENV**))) (GO FOO)) 
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002 ;;; AVP NON-NIL MEANS THAT ASETVARS ARE TO BE EXCLUDED FROM THE CONSED LIST. 
003 

004 (DEFINE REGSLIST 

005 (LAMBDA (CLAM AVP RNL ) 

006 (LET ((AV (AND AVP (CLAMBDA\ASETVARS CLAM)))) 

007 (IF (EQ (CLAMBDA\FNP CLAM) 'NOCLOSE) 

008 (DO ((J (CLAMBDAXDEP CLAM) (+ J 1)) 

009 (TV (CLAMBDANTVARS CLAM) (CDR TV)) 

010 (R RNL 

Oil (IF (MEMO (CAR TV) AV) 

012 R 

013 (CONS (CONS (CAR TV) ( TEMPLOC J)) R)))) 

014 ((NULL TV) R)) 

015 (LET ((VARS (CLAMBDAWARS CLAM))) 

016 (IF (> (LENGTH (COR VARS)) **NUMBER-OF-ARG-REGS**) 

017 (DO ((X (CAR **ARGUMENT-REGISTERS**) "(CDR ,X)) 

018 (V (CDR VARS) (CDR V)) 

019 (R (CONS (CONS (CAR VARS) ***CONT**) RNL) 

020 (IF (MEMO (CAR V) AV) 

021 R 

022 (CONS (CONS (CAR V) (DECARCDRATE "(CAR ,X))) R)))) 

023 ((NULL V) R)) 

024 (DO ((V VARS (CDR V)) 

025 (X **CONT+ARG-REGS** (CDR X)) 

026 (R RNL 

027 (IF (MEMO (CAR V) AV) 

028 R 

029 (CONS (CONS (CAR V) (CAR X)) R)))) 

030 ((NULL V) R)))))))) 
031 

032 (DEFINE SET-UP-ASETVARS 

033 (LAMBDA (CODE AV RNL) 

034 (IF (NULL AV) 

035 CODE 

036 "(PROGN (SETO **ENV** 

037 ,(00 ((A (REVERSE AV) (CDR A)) 

038 (E '**ENV** "(CONS .(LOOKUPICATE (CAR A) RNL) ,E))) 

039 ((NULL A) E))) 

040 P(DEPROGNIFY CODE))))) 
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In the continuation-passing style, functions do not return values; 
instead, they apply a continuation to the value. Thus, the body of a CLAMBDA- 
expression is a form which is not expected to produce a value. On the other 
hand, such a form will have subforms which do produce values, for example 
references to variables. 

Thus the forms to be dealt with in the code generator can be divided into 
those which produce values and those which do not. Initially the latter will 
always be attacked, as the body of a "function"; later the former will be seen. 
COMP-BODY takes a valueless form and compiles it. The routine ANALYZE, which we 
will see later, handles valued forms. 

CONP-BODY instantiates a by now familiar theme: it simply dispatches on 
the type of BODY to some specialist routine. In the case of a CLABELS, it first 
compiles the body of the CLABELS (which itself is valueless if the CLABELS is 
valueless, and so a recursive call to COMP-BODY is used), and then goes to 
PRODUCE-LABELS. For a COMBINATION or RETURN, it does a three-way (for RETURN, 
two-way) sub-dispatch on whether the function is a TRIVFN, a CLAMBDA (or 
CONTINUATION), or something else. 

The PRODUCE series of routines produce code for valueless forms. 
PRODUCE- IF calls ANALYZE on the predicate (which will produce a value), and COMP- 
BODY on the consequent and alternative (which produce no value because the entire 
CIF does not). The three pieces of resulting code are respectively called PRED, 
CON, and ALT. These are then given to CONDICATE, which generates a NacLISP COND 
form to be output. 
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002 ;;; RNL IS THE "RENAME LIST": AN ALIST DESCRIBING HOW TO REFER TO THE VARIABLES IN THE 

003 ;;; ENVIRONMENT. CENV IS THE CONSEO ENVIRONMENT SEEN BY THE BODY. 
004 

005 (DEFINE 

006 COMP-BODY 

00 7 (LAMBDA (BODY RNL PROGNAME BLOCKFNS CENV FNS C) 

008 (LET ((CFM (CNODENCFORM BODY))) 

009 (EQCASE (TYPE CFM) 

010 (CIF 

Oil (PRODUCE- IF BODY RNL PROGNAME BLOCKFNS CENV FNS C)) 

012 (CASET 

013 (PRODUCE-ASET BODY RNL PROGNAME BLOCKFNS CENV FNS C)) 

014 (CLABELS 

015 (OR (EQUAL CENV (CLABELS\CONSENV CFM)) 

016 (ERROR • [Environment disagreement! BODY 'FAIL-ACT)) 

017 (LET ((LCENV (APPEND (CLABELS\FNENV CFM) CENV))) 

018 (COMP-BODY 

019 (CLABELS\BODY CFM) 

020 (ENVCARCDR LCENV RNL) 

021 PROGNAME 

022 BLOCKFNS 

023 LCENV 

024 FNS 

025 (LAMBDA (LBOD FNS) 

026 (PRODUCE-LABELS BODY LBOD RNL PROGNAME BLOCKFNS FNS C))))) 

027 (CCOMBINATION 

028 (LET ((FN (CNODENCFORM (CAR (CCOMBINATION\ARGS CFM))))) 

029 (COND ((EQ (TYPE FN) 'CLAMBDA) 

030 (PRODUCE -LAMBDA-COMB I NAT ION BODY RNL PROGNAME BLOCKFNS CENV FNS C)) 

031 ((AND (EQ (TYPE FN) 'TRIVIAL) 

032 (EQ (TYPE (NODENFORM (TRIVIAL\NODE FN))) 'VARIABLE) 

033 (TRIVFN (VARIABLEWAR (NODE\FORM (TRIVIALNNODE FN))))) 

034 (PRODUCE-TRIVFN-COMBINATION BODY RNL PROGNAME BLOCKFNS CENV FNS C)) 

035 (T (PRODUCE-COMBINATION BODY RNL PROGNAME BLOCKFNS CENV FNS C))))) 

036 (RETURN 

037 (LET ((FN (CNODENCFORM (RETURN\CONT CFM)))) 

038 (IF (EQ (TYPE FN) 'CONTINUATION) 

039 (PRODUCE-CONTINUATION-RETURN BODY RNL PROGNAME BLOCKFNS CENV FNS C) 

040 (PRODUCE-RETURN BODY RNL PROGNAME BLOCKFNS CENV FNS C)) )))))) 
041 

042 (DEFINE PRODUCE- IF 

043 (LAMBDA (CNODE RNL PROGNAME BLOCKFNS CENV FNS C) 

044 (LET ((CFM (CNODENCFORM CNODE))) 

045 (ANALYZE (CIFNPREO CFM) 

046 RNL 

04 7 PROGNAME 

048 BLOCKFNS 

049 FNS 

050 (LAMBDA (PRED FNS) 

051 (COMP-BODY (CIFNCON CFM) 

052 RNL 

053 PROGNAME 

054 BLOCKFNS 

055 CENV 

056 FNS 

057 (LAMBDA (CON FNS) 

058 (COMP-BODY (CIFNALT CFM) 

059 RNL 

060 PROGNAME 

061 BLOCKFNS 

062 CENV 

063 FNS 

064 (LAMBDA (ALT FNS) 

065 (C (CONDICATE PRED 

066 CON 

067 ALT) 

068 FNS)))))))))) 
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PRODUCE-ASET first calls ANALYZE on the body, which must produce a value 
(to be assigned to the CASET variable). There are then two cases, depending on 
whether the CASET\CONT is a CONTINUATION or not. 

If it is, then the body of the continuation is compiled (using COMP- 
BODY), and then LAMBDACATE is called to generate the invocation of the 
continuation. The routine OUTPUT-ASET generates the actual MacLISP SETQ (or 
other construct) for the CASET variable, using the environment location provided 
by LOOKUPICATE. All in all this case is very much like a RETURN with an explicit 
CONTINUATION, except that just before the continuation is invoked a SETQ is stuck 
in . 

If the CASET\CONT is not a CONTINUATION, then ANALYZE is called on the 
CASET\CONT, and then a piece of code is output which sets **FUN** to the 
continuation, **ONE** (which is in the car of **ARGUMENT-REGISTERS**) to the 
value of the body (after also setting the CASET variable, using OUTPUT-ASET), and 
does (RETURN NIL), which is the SCHEME run-time protocol for invoking a 
continuation. 
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002 (DEFINE 

003 PRODUCE-ASET 

004 (LAMBDA (CNODE RNL PROGNAME BLOCKFNS CENV FNS C) 

005 (LET ((CFM (CNODE\CFORM CNODE))) 

006 (ANALYZE (CASEABODY CFM) 
0.0 7 RNL 

008 PROGNAME 

009 BLOCKFNS 

010 FNS 

Oil (LAMBDA (BODY FNS) 

012 (LET ((CONTCFM (CNODE\CFORM (CASET\CONT CFM)))) 

013 (IF (EQ (TYPE CONTCFM) 'CONTINUATION) 

014 (COMP-BODY (CONTINUATION\BODY CONTCFM) 

015 (IF (CONTINUATIONUVARS CONTCFM) 

016 (CONS (CONS (CAR (CONTINUATION\TVARS CONTCFM)) 

017 (TEMPLOC (CONTINUATIONNDEP 

018 CONTCFM))) 

019 (ENVCARCOR CENV RNL)) 

020 (ENVCARCOR CENV RNL)) 

021 PROGNAME 

022 BLOCKFNS 

023 CENV 

024 FNS 

025 (LAMBDA (CODE FNS) 

026 (C (LAMBDACATE 

027 (LIST (CONTINUATIONWAR CONTCFM)) 

028 (CONTINUATIONUVARS CONTCFM) 

029 (CONTINUATION\DEP CONTCFM) 

030 (LIST (OUTPUT-ASET 

031 ( LOOKUP ICATE (CASETWAR CFM) 

032 RNL) 

033 BODY)) 

034 (REMARK-ON (CASET\CONT CFM)) 

035 «**ENV** 
36 CODE) 

037 FNS))) 

038 (ANALYZE 

039 (CASETXCONT CFM) 

040 RNL 

041 PROGNAME 

042 BLOCKFNS 

043 FNS 

044 (LAMBDA (CONT FNS) 

045 (C "(PROGN (SETQ **FUN** ,CONT) 

046 (SETQ .(CAR **ARGUMENT-REGISTERS** ) 

047 .(OUTPUT-ASET 

048 (LOOKUP ICATE (CASETWAR CFM) 

049 RNL) 

050 BODY)) 

051 (RETURN NIL)) 

052 FNS)))))))))) 
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PRODUCE-LABELS takes an already-compiled body LBOD. FNENV-FIX is a 
(possibly empty) list of pieces of code which will fix up the consed environment 
by adding the variables common to all the closures to be made up (this set was 
computed by CLOSE-ANALYZE and put in the FNENV slot of the CLABELS). The code 
for this addition is built from the list of variables by CONS-CLOSEREFS. 

There are then three cases, depending on the type of closures to be 
constructed (NOCLOSE, EZCLOSE, or NIL). Suppose that the CLABELS is: 

(CLABELS ((FOO (LAMBDA ...)) 
(BAR (LAMBDA ...))) 
<body>) 

Let us see roughly what code is produced for each case. 

For a NIL type (full closures), the idea is merely to create all the 
closures m standard form (but with a null environment), add them all to the 
consed environment, and then go back and clobber the environment portion of the 
closures with the new resulting environment, plus any other variables needed. 
Now a standard closure looks like (CBETA <value of progname> <tag> 
<environment>). (At run time the value of the progname will be a MacLISP SUBR 
pointer for the module; the tag identifies the particular routine in the 
module.) In the DO loop, FNS accumulates the function definitions (to be 
compiled separately later), RP accumulates RPLACD forms for clobbering the 
closures and CB accumulates constructors of CBETA lists. For our example, the 
generated code looks like: ^ ' 

((LAMBDA (FOO BAR) 

(SETQ **ENV** (CONS ... (CONS X43 **ENV***).. )) 

(RPLACD (CDDR BAR) (CONS ... (CONS X72 **ENV**) . . . )) 

(RPLACD (CDDR FOO) (CONS ... (CONS X69 **ENV** ) . . . ) ) 

<body>) ' " 
(LIST 'CBETA ?-453 'FOO-TAG) 
(LIST 'CBETA ?-453 'BAR-TAG)) 

^ ere RA J' T 4 A 5 r 3 is thf PROGNAME for the module containing the CLABELS, and FOO-TAG 
2 51 5i, are ta9S (whose names wiU actu aHy look like FNVAR-91) for FOO 

and fo ; h p ( f?r,t in , P f Tn Ct C ^ E : A N ALYZE creat « * null FNENV for type NIL CLABELS. 
1 f L L TQ T ld in faCt n0t appear * H ™ever, the decision as to the 

form of the FNENV is only a heuristic, and so PRODUCE-LABELS is written so as to 

be prepared for any possible choice of FNENV and CLOSEREFS of individual labelled 

w^Tk' • "«. thi t Way the heuristic ^ CLOSE-ANALYZE can be freely adjusted 
without having to change PRODUCE-LABELS.) J 

For the EZCLOSE case the "closures" need only contain environments, not 
also code pointers. A trick is needed here, however, to build the circular 
environment. When adding the labelled functions to the environment, we must 
somehow cons in an object; but we want this object to possibly be the 
environment itself! What we do instead is to make up a list of the tag and 

useful R fn A r C H l h±S USt VL l * With the envi — t. The'tag is never used but Is 
r*!l \h ^"Q.ging- Th is method also makes the code very similar to the NIL 
case, the only difference being that the atom CBETA and the value of the PROGNAME 
are not consed onto each closure. wt 



001 RABBIT 56 8 05/15 /78 Page 47 

002 (DEFINE 

003 PRODUCE-LABELS 

004 (LAMBDA (CNODE LBOD RNL PROGNAME BLOCKFNS FNS C) 

005 (LET ((CFM (CNODE\CFORM CNODE))) 

006 (LET ((VARS (CLABELSNFNVARS CFM)) 

007 (DEFS (CLABELSNFNDEFS CFM)) 

008 (FNENV (CLABELS\FNENV CFM))) 

009 (LET ((FNENV-FIX (IF FNENV "((SETQ **ENV** .(CONS-CLOSEREFS FNENV RNL)))))) 

010 (EQCASE (CLABELSNEASY CFM) 
Oil (NIL 

012 (DO ((V VARS (CDR V)) 

013 (D DEFS (CDR D)) 

014 (FNS FNS (CONS (LIST PROGNAME (CAR D) NIL) FNS)) 

015 (RP NIL (CONS "(RPLACD (CDDR .(CAR V)) 

016 .(CONS-CLOSEREFS 

017 (CLAMB0A\CLOSEREFS 

018 (CNODE\CFORM (CAR D))) 

019 RNL)) 

020 RP)) 

021 (CB NIL (CONS "(LIST 'CBETA .PROGNAME '.(CAR V)) CB))) 

022 ((NULL.V) 

023 (C "((LAMBDA .VARS 

024 gFNENV-FIX 

025 SRP 

026 g(DEPROGNIFY LBOD)) 

027 UNREVERSE CB)) 

028 FNS)))) 

029 (EZCLOSE 

030 (DO ((V VARS (CDR V)) 

031 (D DEFS (COR D)) 

032 (FNS FNS (CONS (LIST PROGNAME (CAR D) NIL) FNS)) 

033 (RP NIL (CONS "(RPLACD .(CAR V) 

034 .(CONS-CLOSEREFS 

035 (CLAMBDA\CLOSEREFS 

36 (CNODENCFORM (CAR D))) 

037 RNL)) 

038 RP)) 

039 (CB NIL (CONS "(LIST '.(CAR V)) CB))) 

040 ((NULL V) 

041 (C "((LAMBDA .VARS 

042 GFNENV-FIX 

043 «RP 

044 g(DEPROGNIFY LBOD)) 

045 UNREVERSE CB)) 

046 FNS)))) 
04 7 (NOCLOSE 

048 (C "(PROGN gFNENV-FIX g(DEPROGNIFY LBOD)) 

049 (DO ((V VARS (CDR V)) 

050 (D DEFS (CDR D)) 

051 (FNS FNS (CONS (LIST PROGNAME (CAR D) RNL) FNS))) 

052 ((NULL V) FNS)))))))))) 
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One problem is that these "closures" are not of the same form as ordinary 
EZCLOSE closures, which do not have the tag. This is the purpose of the LABELS- 
FUNCTION properties which BIND-ANALYZE created; when a call to an EZCLOSE 
function is generated, the presence of a LABELS-FUNCTION property indicates that 
the "closure" itself is not the environment, but rather its cdr is. (It would be 
possible to do without the cell containing the tag, by instead making up the 
environment with values of NIL, then constructing the "closures" as simple 
environments, and then going back and clobbering the environment structure with 
the closure objects, rather than clobbering the closure objects themselves. The 

2™;^" n0t t0 d0 this was ratner arbitrary.) The generated code for the 
EZCLOSE case thus looks like: 

((LAMBDA (FOO BAR) 

(SETQ **ENV** (CONS ... (CONS X43 **ENV***) . . . ) ) 
(RPLACD (CDDR BAR) (CONS ... (CONS X72 **ENV**) . . . ) ) 
(RPLACD (CDDR FOO) (CONS ... (CONS X69 **ENV**) . . . ) ) 
<body>) 

(LIST 'FOO-TAG) 

(LIST 'BAR-TAG)) 

In the NOCLOSE case, no closures are made at run time for the labelled 
functions, and so the code consists merely of the FNENV-FIX (which, again, using 
the current heuristic in CLOSE-ANALYZE will always be null in the NOCLOSE case) 
and the code for the body: 

(PROGN (SETQ **ENV** (CONS ... (CONS X43 **ENV**) . . . ) ) <body>) 

In any case, of course, the labelled functions are added to the FNS list which is 
handed back to C for later compilation. 

PRODUCE-LAMBDA-COMBINATION generates code for the case of ((CLAMBDA ...) 
argl ... argn). First a number of consistency checks are performed, to make 
sure the pass-2 analysis is not completely awry. Then code is generated for the 
body of the CLAMBDA, using COMP-BODY. Then all the arguments, which are of 
A?.*r5Sr expected t0 P ro duce values, are given to MAPANALYZE, which will call 
ANALYZE on each in turn and return a list of the pieces of generated code (here 
called ARGS in the continuation handed to MAPANALYZE). Finally, LAMBDACATE is 
called to generate the code for entering the body after setting up the arguments 
in an appropriate manner. Notice the use of SET-UP-ASETVARS to generate any 
necessary additional code for adding ASETVARS to the consed environment on 
entering the body. (A more complicated compiler would in this situation add the 
argument values to the consed environment directly, rather than first putting 
them in registers (which is done by LAMBDACATE) and then moving the registers 
into the consed environment (which is done by SET-UP-ASETVARS). To do this 

«T e nS r :c™In«. inVOlve destr °y in 9 the modular distinction between LAMBDACATE and 
5ti-up-ASETVARS. The extra complications were deemed not worthwhile because in 
practice the ASETVARS set is almost always empty anyway.) 



001 RABBIT 568 ._ 05/15/78 Page 48 

002 (DEFINE 

003 PRODUCE-LAMBDA-COMBINATION 

004 (LAMBDA (CNODE RNL PROGNAME BLOCKFNS CENV FNS C) 

005 (LET ((CFM (CNO0E\CFORM CNODE))) 

006 (LET ((FN (CNODEXCFORM (CAR (CCOMBINATION\ARGS CFM))))) 

007 (AND (CLAMBDA\CLOSEREFS FN) 

008 (ERROR • (Functional LAMBDA has CLOSEREFS| CNODE 'FAIL-ACT)) 

009 (OR (EQUAL CENV (CLAMBDA\CONSENV FN)) 

010 (ERROR ' (Environment disagreement! CNODE 'FAIL-ACT)) 
Oil (OR (EQ (CLAMBDANFNP FN) 'NOCLOSE) 

012 (ERROR • |Non-NOCLOSE LAMBDA in function position| CNODE 'FAIL-ACT)) 

013 (COMP-BODY 

014 (CLAMBDA\BODY FN) 

015 (ENVCARCDR (CLAMBDA\ASETVARS FN) 

016 (REGSLIST FN T (ENVCARCDR CENV RNL))) 

017 PROGNAME 

018 BLOCKFNS 

019 (APPEND (CLAMBDA\ASETVARS FN) CENV) 

020 FNS 

021 (LAMBDA (BODY FNS) 

022 (MAPANALYZE (CDR (CCOMBINATION\ARGS CFM)) 

023 RNL 

024 PROGNAME 

025 BLOCKFNS 

026 FNS 

02 7 (LAMBDA (ARGS FNS) 

028 (C (LAMBDACATE (CLAMBDA\VARS FN) 

029 (CLAMBDANTVARS FN) 

030 (CLAMBDA\DEP FN) 

031 ARGS 

032 (REMARK-ON 

033 (CAR (CCOMBINATION\ARGS CFM))) 

034 '**ENV** 

035 (SET-UP-ASETVARS 

036 BODY 

037 (CLAMBOAXASETVARS FN) 

038 (REGSLIST FN NIL NIL))) 

039 FNS))))))))) 
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PRODUCE-TRIVFN-CONBINATION handles a case like (CONS continuation argl 
argZ), i.e. a CCONBINATION whose function position contains a TRIVFN. First all 
the arguments ( excluding the continuation!) are given to MAPANALYZE; then a 
dispatch is made on whether the continuation is a CONTINUATION or a CVARIABLE, 
and one of two specialists is called. 

PRODUCE-TRIVFN-COMBINATION-CONTINUATION handles a case like (CONS 
(CONTINUATION (Z) <body>) argl argZ). The idea here is to compile it 
approximately as if it were 

((CONTINUATION (Z) <body>) (CONS argl arg2)) 

That is, the arguments are evaluated, the trivial function is given them to 
produce a value, and that value is then given to the continuation. Accordingly, 
the body of the CONTINUATION is compiled using COMP-BODY, and then LAMBDACATE 
takes care of setting up the argument (the fourth argument to LAMBDACATE is a 
list of the MacLISP code for invoking the trivial function) and invoking the body 
of the (necessarily NOCLOSE) CONTINUATION. 
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002 (DEFINE PRODUCE-TRIVFN-COMBINATION 

00 3 (LAMBDA (CNODE RNL PROGNAME BLOCKFNS CENV FNS C) 

004 (LET ((CFM (CNODE\CFORM CNODE))) 

005 (LET ((FN (CNODE\CFORM (CAR (CC0MBINATI0N\AR6S CFM)))) 

006 (CONT (CNO0E\CFORM (CADR (CC0MBINATI0N\AR6S CFM))))) 
00 7 (MAPANALYZE (CDDR (CCOMBINATION\ARGS CFM)) 

008 RNL 

009 PROGNAME 

010 BLOCKFNS 
Oil FNS 

012 (LAMBDA (ARGS FNS) 

013 (EQCASE (TYPE CONT) 

014 (CONTINUATION 

015 (PRODUCE-TRIVFN-COMBINAT ION-CONTINUATION 

016 CNODE RNL PROGNAME BLOCKFNS CENV 

017 FNS C CFM FN CONT ARGS)) 

018 (CVARIABLE 

019 (PRODUCE-TRIVFN-COMBINATION-CVARIABLE 

020 CNODE RNL PROGNAME BLOCKFNS CENV 

021 FNS C CFM FN CONT ARGS))))))))) 
022 

023 (DEFINE PRODUCE-TRIVFN-COMBINATION-CONTINUATION 

024 (LAMBDA (CNODE RNL PROGNAME BLOCKFNS CENV FNS C CFM FN CONT ARGS) 

025 (BLOCK (AND (CONTINUATION\CLOSEREFS CONT) 

026 (ERROR • | CONTINUATION for TRIVFN has CLOSEREFS| CNODE 'FAIL-ACT)) 

027 (OR (EO (CONTINUATION\FNP CONT) 'NOCLOSE) 

028 (ERROR ' |Non-NOCLOSE CONTINUATION for TRIVFN| CNODE 'FAIL-ACT)) 

029 (COMP-BODY (CONTINUATION\BODY CONT) 
030- (IF (CONTINUATION\TVARS CONT) 

031 (CONS (CONS (CAR (CONTINUATION\TVARS CONT)) 

032 (TEMPLOC (CONTINUATION\DEP CONT))) 

033 (ENVCARCOR CENV RNL)) 

034 (ENVCARCDR CENV RNL)) 

035 PROGNAME 

036 BLOCKFNS 

037 CENV 

038 FNS 

039 (LAMBDA (BODY FNS) 

040 (C (LAMBDACATE 

041 (LIST (CONTINUATIONWAR CONT)) 

042 (CONTINUATIONYTVARS CONT) 

043 (CONTINUATIONNDEP CONT) 

044 (LIST "(.(VARIABLEWAR (NODE\FORM ( TRIVIAL\NODE FN))) 

045 8ARGS)) 

046 (REMARK-ON (CADR (CCOMBINATION\ARGS CFM))) 
04 7 '**ENV** 

048 BODY) 

049 FNS)))))) 
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PRODUCE-TRIVFN-COMBINATION-CVARIABLE handles a case like (CONS CONT-43 
argl argZ), where the continuation for a trivial function call is a CVARIABLE 
In this situation the continuation is given to ANALYZE to generate MacLISP code 
for referring to it; there are then two cases, depending on whether the 
CVARIABLE has a KNOWN-FUNCTION property. (Note that before the decision is made 
VAL names the piece of MacLISP code for calling the trivial function on the 
arguments.) 

If the CVARIABLE denotes a KNOWN-FUNCTION, then it should be possible to 
invoke it by adjusting the environment, setting up the arguments in registers, 

SnmirU P r!??i., t0 the C ° de * FirSt the environm ent adjustment is computed; ADJUST- 
KNOWNFN-CENV generates a piece of MacLISP code which will at run time compute the 
correct new environment in which the continuation will expect to run. There are 
then two subcases, depending on whether the KNOWN-FUNCTION is of type NOCLOSE or 
not. If it is, then LAMBDACATE is used to set up the arguments in the 
"K P H°H ri l . regi u sters < tne last argument of NIL indicates that there is no 
body , but rather that the caller of LAMBDACATE takes the responsibility of 
jumping to the code). If it is not, then PSETQIFY is used, because the value 
will always go in **ONE** (which is the car of **ARGUMENT-REGISTERS**) . In 
either case, a GO is generated to jump to the code (within the current module, of 
course) for the continuation. 

If the continuation is not a KNOWN- FUNCTION, then the standard function 
linkage mechanism is used: the continuation is put into **FUN**, the value into 
**ONE**, and then (RETURN NIL) exits the module to request the SCHEME run-time 
interface to invoke the continuation in whatever manner is appropriate. 
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002 (DEFINE PRODUCE-TRIVFN-COMBINATION-CVARIABLE 

003 (LAMBDA (CNODE RNL PROGNAME BLOCKFNS CENV FNS C CFM FN CONT ARGS) 

004 (ANALYZE 

005 (CADR (CCOMBINATION\ARGS CFM)) 

006 RNL 

00 7 PROGNAME 

008 BLOCKFNS 

009 FNS 

010 (LAMBDA (CONTF FNS) 

Oil (LET ((KF (GET (CVARIABLEWAR CONT) 'KNOWN-FUNCTION)) 

012 (VAL "(,( VARIABLE WAR (NODE\FORM ( TRIVIAL\NODE FN))) 3ARGS ) ) ) 

013 (IF KF 

014 (LET ((KCFM (CNODE\CFORM KF ) ) ) 

015 (LET ((ENVADJ 

016 (ADJUST-KNOWNFN-CENV CENV 

017 (CVARIABLEWAR CONT) 

018 CONTF 

019 (CONTINUATION\FNP KCFM) 

020 (APPEND 

021 (CONTINUATIONNCLOSEREFS KCFM) 

022 (CONTINUATION\CONSENV KCFM))))) 

023 (C "(PROGN 

024 8(IF (EQ (CONTINUATIONNFNP KCFM) 

025 'NOCLOSE) 

026 (DEPROGNIFY 

027 (LAMBOACATE (LIST (CONTINUATIONWAR KCFM)) 

028 (CONTINUATION\TVARS KCFM) 

029 (CONTINUATIONNDEP KCFM) 

030 (LIST VAL) 

031 (REMARK-ON KF ) 
32 ENVADJ 

033 NIL)) 

34 (PSETQIFY (LIST ENVADJ VAL) 

035 (LIST '**ENV** 

036 (CAR **ARGUMENT-REGISTERS**)))) 

037 (GO ,(CONTINUATION\NAME KCFM))) 

038 FNS))) 

039 (C "(PROGN (SETQ **FUN** , CONTF ) 

040 (SETQ .(CAR **ARGUMENT-REGISTERS** ) ,VAL) 

041 (RETURN NIL)) 

042 FNS))))))) 
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PRODUCE-COMBINATION handles combinations whose function positions contain 
neither TRIVFNs nor CLAMBDAs. All of the arguments, including the function 
position itself and the continuation, are given to MAPANALYZE, resulting in a 
list FORM of pieces of MacLISP code. There are then two cases. If the function 
position is a VARIABLE (within a TRIVIAL - not a CVARIABLE!), then PRODUCE- 
COMBINATION-VARIABLE is used. Otherwise code is generated to use the standard 
SCHEME run-time interface: first set **FUN** to the function, then set up the 
arguments in the standard argument registers (PSETQ-ARGS generates the code for 
this), then set **NARGS** to the number of arguments (this does not include the 
continuation), and exit the module with (RETURN NIL). 

PRODUCE-COMBINATION-VARIABLE first determines whether the variable has a 
KNOWN-FUNCTION property. If so, then the approach is very much as in TRIVFN- 
COMBINATION-CVARIABLE: first the environment adjustment is computed, then either 
LAMBDACATE or PSETQ-ARGS-ENV is used to adjust the environment and set up the 
arguments, and finally a GO to the piece of code for the KNOWN-FUNCTION is 
generated. 

If the variable is not a KNOWN- FUNCTION, then it may still be in the list 
BLOCKFNS (which, recall, is a list of user functions included in this module). 
If so, the effect on the code generation strategy is roughly as if it were a 
KNOWN-FUNCTION. The environment adjustment is done differently, but a GO is 
generated to the piece of code for the called function. 

In any other case, the standard interface is used. **FUN** is set to the 
function, the arguments are set up, **NARGS** is set to the number of arguments, 
and (RETURN NIL) exits the module. 
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002 (DEFINE PRODUCE-COMBINATION 

003 (LAMBDA (CNODE RNL PROGNAME BLOCKFNS CENV FNS C) 

004 (MAPANALYZE (CCOMBINATION\ARGS (CNODE\CFORM CNODE)) 

005 RNL 

006 PROGNAME 
00 7 BLOCKFNS 

008 FNS 

009 (LAMBDA (FORM FNS) 

010 (C (LET ((F (CNODE\CFORM (CAR (CCOMBINATION\ARGS 

Oil (CNODEXCFORM CNODE)))))) 

012 (IF (AND (EQ (TYPE F) 'TRIVIAL) 

013 (EQ (TYPE (NODEXFORM ( TRIVIAL\NODE F))) 

014 'VARIABLE)) 

015 (LET ((V (VARIABLEWAR 

016 (NODEXFORM ( TRIVIAL\NODE F))))) 

017 (PRODUCE-COMBINATION-VARIABLE 

018 CNODE RNL PROGNAME BLOCKFNS CENV 

019 FNS C FORM V (GET V 'KNOWN-FUNCTION))) 

020 "(PROGN (SETQ **FUN** .(CAR FORM)) 

021 §(PSETQ-ARGS (CDR FORM)) 

022 (SETQ **NARGS** • .(LENGTH (CDDR FORM))) 

023 (RETURN NIL)))) 

024 FNS))))) 
025 

026 (DEFINE PRODUCE-COMBINATION-VARIABLE 

027 (LAMBDA (CNODE RNL PROGNAME BLOCKFNS CENV FNS C FORM V KFN) 

028 (IF KFN 

029 (LET ((ENVADJ 

030 (ADJUST-KNOWNFN-CENV CENV 

031 V 

032 (CAR FORM) 

033 (CLAMBDAXFNP (CNODEXCFORM KFN)) 

034 (APPEND (CLAMBDAXCLOSEREFS (CNODEXCFORM KFN)) 

035 (CLAMBDAXCONSENV (CNODEXCFORM KFN)))))) 

036 (OR (EQ (TYPE (CNODEXCFORM KFN)) 'CLAMBDA) 

037 (ERROR ' | Known function not CLAMBDA| CNODE 'FAIL-ACT)) 

038 "(PROGN §(IF (EQ (CLAMBDAXFNP (CNODEXCFORM KFN)) 'NOCLOSE) 

039 (DEPROGNIFY 

040 (LAMBDACATE (CLAMBDAXVARS (CNODEXCFORM KFN)) , 

041 (CLAMBDAXTVARS (CNODEXCFORM KFN)) 

042 (CLAMBDAXDEP (CNODEXCFORM KFN)) 

043 (CDR FORM) 

044 (REMARK-ON KFN) 

045 ENVADJ 

046 NIL)) 

047 (PSETQ-ARGS-ENV (CDR FORM) ENVADJ)) 

048 (GO .(CLAMBDAXNAME (CNODEXCFORM KFN))))) 

049 (IF (ASSQ V BLOCKFNS) 

050 "(PROGN g(PSETQ-ARGS (CDR FORM)) 

051 P( IF (NOT (EQUAL (CLAMBDAXCONSENV 

052 (CNODEXCFORM 

053 (CADR (ASSQ V BLOCKFNS)))) 

054 CENV)) 

055 "((SETQ **ENV** (CDDDR .(CAR FORM))))) 

056 (GO .(CLAMBDAXNAME (CNODEXCFORM (CADR (ASSQ V BLOCKFNS)))))) 

057 "(PROGN (SETQ **FUN** .(CAR FORM)) 

058 P(PSETQ-ARGS (COR FORM)) 

059 (SETQ **NARGS** '.(LENGTH (CDDR FORM))) 

060 (RETURN NIL)))))) 



228 

ADJUST-KNOWNFN-CENV computes a piece of code for adjusting the 
environment. CENV is the internal representation (as a list of variable names) 
of the environment in which the generated code will be used. VAR is the name of 
the variable which names the function to be invoked, and for whose sake the 
environment is to be adjusted. VARREF is a piece of MacLISP code by which the 
run-time value of VAR may be accessed. FNP is the FNP type of the KNOWN-FUNCTION 
denoted by VAR. LCENV is the representation of the environment for the function. 
Thus, the generated code should compute LCENV given CENV. 

The two easy cases are when LCENV=CENV, in which case the environment 
does not change, and when LCENV=NIL, in which case the run-time environment will 
also be NIL. Otherwise it breaks down into three cases on FNP. 

For FNP=NOCLOSE, it must be true that LCENV is some tail of CENV; that 
is, there is a stack-like discipline for NOCLOSE functions, and so CENV was 
constructed by adding things to LCENV. The piece of code must therefore consist 
of some number of CDR operations on **ENV**. If this operation does not in fact 
produce LCENV, then there is an inconsistency in the compiler. 

For FNP=EZCLOSE, then VARREF can be used to reference the run-time 
"closure"; this may require a CDR operation if the function is an EZCLOSE 
LABELS-FUNCTION (see PRODUCE-LABELS). 

For FNP=NIL, then VARREF will refer to a full closure; the CDDDR of this 
closure is the environment. 

PRODUCE-CONTINUATION-RETURN is, mutatis mutandis , identical to PRODUCE- 
LAMBDA-COMBINATION. This is a good example of the fact that much code was 
duplicated because of the early design decision to treat COMBINATION and RETURN 
as distinct data types. 
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002 (DEFINE ADJUST-KNOWNFN-CENV 

003 (LAMBDA (CENV VAR VARREF FNP LCENV) 

004 (COND ((EQUAL LCENV CENV) '**ENV**) 

005 ((NULL LCENV) 'NIL) 

006 (T (EQCASE FNP 

00 7 (NOCLOSE 

008 (DO ((X CENV (COR X)) 

009 (Y '**ENV** "(COR ,Y)) 

010 (I (- (LENGTH CENV) (LENGTH LCENV)) (- I 1))) 
Oil ((< I 1) 

012 (IF (EQUAL X LCENV) 

013 (DECARCORATE Y) 

014 (ERROR ' |Cannot recover environment for known funct1on| 

015 VAR 

016 'FAIL-ACT))))) 

017 (EZCLOSE 

018 (IF (GET VAR 'LABELS-FUNCTION) 

019 "(CDR , VARREF) 

020 VARREF)) 

021 (NIL "(CDDOR .VARREF ))))))) 
022 

023 (DEFINE PRODUCE-CONTINUATION-RETURN 

024 (LAMBDA (CNODE RNL PROGNAME BLOCKFNS CENV FNS C) 

025 (LET ((CFM (CNODE\CFORM CNODE))) 

026 (LET ((FN (CNO0E\CFORM (RETURN\CONT CFM)))) 
.027 (AND (CONTINUATION\CLOSEREFS FN) 

028 (ERROR ' |Functiona1 CONTINUATION has CLOSEREFS| CNODE 'FAIL-ACT)) 

029 (OR (EQUAL CENV (CONTINUATION\CONSENV FN)) 

030 (ERROR ' (Environment disagreement! CNODE 'FAIL-ACT)) 

031 (OR (EQ (CONTINUATION\FNP FN) 'NOCLOSE) 

032 (ERROR ' |Non-NOCLOSE CONTINUATION in function pos1tion| 

033 CNODE 

034 'FAIL-ACT)) 

035 (COMP-BODY (CONTINUATION\BODY FN) 

036 (IF (CONTINUATIONYTVARS FN) 

03 7 (CONS (CONS (CAR (CONTINUATIONYTVARS FN)) 

038 (TEMPLOC (CONTINUATION\DEP FN))) 

039 (ENVCARCDR CENV RNL)) 

040 (ENVCARCDR CENV RNL)) 

041 PROGNAME 

042 BLOCKFNS 

043 CENV 

044 FNS 

045 (LAMBDA (BODY FNS) 

046 (ANALYZE (RETURNWAL CFM) 

04 7 RNL 

048 PROGNAME 

049 BLOCKFNS 

050 FNS 

051 (LAMBDA (VAL FNS) 

052 (C (LAMBDACATE 

053 (LIST (CONTINUATIONWAR FN)) 

054 (CONTINUATIONYTVARS FN) 

055 (CONTINUATION\DEP FN) 

056 (LIST VAL) 

057 (REMARK-ON (RETURN\CONT CFM)) 

058 '**ENV** 

059 BODY) 

060 FNS))))))))) 
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PRODUCE-RETURN and PRODUCE-RETURN- 1 together are almost identical to 
PRODUCE-COMBINATION and PRODUCE-COMBINATION-VARIABLE, except that the division 
between the two parts is different, and the BLOCKFNS trick is not applicable to 
RETURN. 

PRODUCE-RETURN merely calls ANALYZE on each of the continuation and the 
value, and calls PRODUCE-RETURN- 1. 

PRODUCE-RETURN- 1 checks to see whether the continuation is a KNOWN- 
FUNCTION. If so, the environment adjustment is computed, and code is generated 
in a way similar to previous routines. If not, the standard interface (involving 
(RETURN NIL)) is used. Notice the check to see if VAL is in fact **ONE** (the 
car of **ARGUMENT-REGISTERS**); if so, the redundant code (SETQ **ONE** **ONE**) 
is suppressed. 
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002 (DEFINE PRODUCE-RETURN 

00 3 (LAMBDA (CNODE RNL PROGNAME BLOCKFNS CENV FNS C) 

004 (LET ((CFM (CNODE\CFORM CNODE))) 

005 (ANALYZE (RETURNWAL CFM) 

006 RNL 

00 7 PROGNAME 

008 BLOCKFNS 

009 FNS 

010 (LAMBDA (VAL FNS) 

Oil (ANALYZE (RETURN\CONT CFM) 

012 RNL 

013 PROGNAME 

014 BLOCKFNS 

015 FNS 

016 (LAMBDA (CONT FNS) 

017 ( PRODUCE-RETURN- 1 

018 CNODE RNL PROGNAME BLOCKFNS 

019 CENV FNS C CFM VAL CONT)))))))) 
020 

021 (DEFINE PRODUCE-RETURN-1 

022 (LAMBDA (CNODE RNL PROGNAME BLOCKFNS CENV FNS C CFM VAL CONT) 

023 (IF (AND (EQ (TYPE (CNODEXCFORM (RETURN\CONT CFM))) 'CVARIABLE) 

024 (GET (CVARIABLEWAR (CNODE\CFORM (RETURN\CONT CFM))) 

025 'KNOWN -FUNCTION)) 

026 (LET ((KCFM (CNODEXCFORM 

027 (GET (CVARIABLEWAR 

028 (CNODEXCFORM (RETURNXCONT CFM))) 

029 'KNOWN-FUNCTION)))) 

030 (OR (EQ (TYPE KCFM) 'CONTINUATION) 

031 (ERROR '|Known function not CONTINUATION! CNODE 'FAIL-ACT)) 
32 (LET ((ENVADJ 

033 (ADJUST-KNOWNFN-CENV CENV 

034 (CVARIABLEWAR (CNODEXCFORM (RETURNXCONT CFM))) 

035 CONT 

036 (CONTINUATIONXFNP KCFM) 

037 (APPEND 

038 (CONTINUATIONXCLOSEREFS KCFM) 

039 (CONTINUATIONXCONSENV KCFM))))) 

040 (C "(PROGN 9( IF (EQ (CONTINUATIONXFNP KCFM) ' NOCLOSE ) 

041 (DEPROGNIFY 

042 (LAMBDACATE (LIST (CONTINUATIONXVAR KCFM)) 

043 (CONTINUATIONXTVARS KCFM) 

044 (CONTINUATIONXDEP KCFM) 

045 (LIST VAL) 

046 (REMARK-ON 

04 7 (GET (CVARIABLEWAR 

048 (CNODEXCFORM (RETURNXCONT CFM))) 

049 'KNOWN-FUNCTION)) 

050 ENVADJ 

051 NIL)) 

052 (PSETQIFY (LIST ENVADJ VAL) 

053 (LIST '**ENV** 

054 (CAR **ARGUMENT-REGISTERS**)))) 

055 (GO .(CONTINUATIONXNAME KCFM))) 

056 FNS))) 

057 (C "(PROGN (SETQ **FUN** .CONT) 

058 C(IF (NOT (EQ VAL (CAR **ARGUMENT-REGISTERS**))) 

059 "((SETQ .(CAR **ARGUMENT-REGISTERS**) ,VAL))) 

060 (RETURN NIL)) 

061 FNS)))) 
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LAMBDACATE generates code for invoking a NOCLOSE KNOWN- FUNCTION. It 
arranges for the arguments to be evaluated and put in the proper registers, and 
also performs some optimizations. 

VARS is a list of the variables which are to be bound. TVARS is a list 
of those variables (a subset of VARS) which will actually be passed through 
registers, as specified by the TVARS slot of the CLAMBDA or CONTINUATION; this 
is used for a consistency check on the optimizations of LAMBDACATE. DEP is the 
register depth of the function (the DEP slot). ARGS is a list of pieces of 
MacLISP code which have been generated for the arguments to the function. REN is 
a comment (usually one generated by REMARK-ON) to be included in the generated 
code for debugging purposes; this comment typically details the state of the 
environment and what variables are being passed through registers at this point. 
ENVADJ is a piece of MacLISP code (usually generated by ADJUST-KNOWNFN-CENV) to 
whose value **ENV** is to be set, to adjust the environment. BODY may be a list 
of pieces of MacLISP code which constitute the body of the known function, to be 
executed after the arguments are set up (typically because of a combination like 
((LAMBDA ...) ...)), or it may be NIL, implying that the caller of LAMBDACATE 
intends to generate a GO to the code. 

LAMBDACATE divides ARGS into three classes: (1) arguments which are 
themselves NOCLOSE KNOWN- FUNCTIONS -- such arguments actually have no actual run- 
time representation as a MacLISP data object, and so are not passed at all; (2) 
arguments whose corresponding variables are never referenced -- these are 
accumulated in EFFARGS, a list of arguments to be evaluated for effect only 
(presumably the optimizer eliminated those unreferenced arguments which had no 
side effects); and (3) arguments whose values are needed and are to be passed 
through the registers -- these are accumulated in REALARGS, and the corresponding 
variables in REALVARS. 

When this loop is done, (the reverse of) REALVARS should equal TVARS, for 
it is the set of actually passed arguments. 

The generated code first evaluates all the EFFARGS (if any), then sets 
all the proper registers to the REALARGS (this code is generated by PSETQ-TEMPS), 
then (after the remark REM) executed the BODY (which, if NIL, is empty). 

For example, consider generating code for: 

((LAMBDA (FAB) ... (FA) ...) 
(LAMBDA (X) ...) 
(CONS X Y) 
(PRINT Z)) 

where F denotes a NOCLOSE KNOWN -FUNCTION, and B is never referred to. Then the 
call to LAMBDACATE might look like this: 
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002 ;;; HANDLE CASE OF INVOKING A KNOWN NOCLOSE FUNCTION OR CONTINUATION. 

003 ;;; FOR AN EXPLICIT ((LAMBDA ... BODY) ...), BODY IS THE BODY. 

004 ;;; OTHERWISE, IT IS NIL. AND SOMEONE WILL DO AN APPROPRIATE GO LATER. 
005 

006 (DEFINE LAMBDACATE 

00 7 (LAMBDA (VARS TVARS DEP ARGS REM ENVADJ BODY) 

008 (LABELS ((LOOP 

009 (LAMBDA (V A REALVARS REALARGS EFFARGS) 

010 ;;REALVARS IS COMPUTED PURELY FOR. ERROR-CHECKING 
Oil (IF (NULL A) 

012 (LET ((B "(PROGN ?(PSETQ-TEMPS (NREVERSE REALARGS) DEP ENVADJ) 

013 ,REM 

014 G(DEPROGNIFY BODY))) 

015 (RV (NREVERSE REALVARS))) 

016 (IF (NOT (EQUAL RV TVARS)) 

017 (ERROR ' |TVARS screwup in LAMBDACATE | 

018 "((VARS = .VARS) 

019 (TVARS = , TVARS) 

020 (REALVARS « ,RV)) 

021 'FAIL-ACT)) 

022 (IF EFFARGS 

023 "(PROGN CEFFARGS P(DEPROGNIFY B)) 

024 B)) 

025 (COND ((LET ((KFN (GET (CAR V) 'KNOWN-FUNCTION))) 

026 (AND KFN 

027 (EQ (EQCASE (TYPE (CNOOEXCFORM KFN)) 

028 (CLAMBDA 

029 (CLAMBDA\FNP 

030 (CNO0E\CFORM KFN))) 

031 (CONTINUATION 

032 (CONTINUATION\FNP 

033 (CNODEXCFORM KFN)))) 

034 'NOCLOSE))) 

035 (LOOP (CDR V) (CDR A) REALVARS REALARGS EFFARGS)) 

036 ((OR (GET (CAR V) 'READ-REFS) 

037 (GET (CAR V) 'WRITE-REFS)) 

038 (LOOP (COR V) 

039 (CDR A) 

040 (CONS (CAR V) REALVARS) 

041 (CONS (CAR A) REALARGS) 

042 EFFARGS)) 

043 (T (LOOP (CDR V) 

044 (CDR A) 

045 REALVARS 

046 REALARGS 

047 (CONS (CAR A) EFFARGS)))))))) 

048 (LOOP VARS ARGS NIL NIL NIL)))) 
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(LAMBDACATE '(FAB) 
'(A) 
12 

'(<illegal> (CONS X43 Y69) (PRINT Z91)) 

<remark> 

**ENV** 

<body>) 



where <illegal> is an object that should never be looked at (see ANALYZE- 
CLANBDA); X43, Y69, and Z91 are pieces of code which refer to the variables X, 
Y, and Z; < remark > is some remark; the environment adjustment is assumed to be 
trivial; and <body> is the code for the body of the LAMBDA. The generated code 
would look something like this: 

(PROGN (PRINT Z91) 

(SETQ -12- (CONS X43 Y69)) 

<remark> 

<body>) 

Notice that LANBDACATE explicitly takes advantage of the fact that the execution 
of arguments for a combination may be arbitrarily reordered. 

The various PSETQ... routines generate code to perform Parallel SETQs, 
i.e. the simultaneous assignment of several values to several values. The 
parallel nature is important, because some of the values may refer to other 
registers being assigned to, and a sequential series of assignments might not 
work. 

The main routine here is PSETQIFY, which takes a list of arguments 
(pieces of MacLISP code which will generate values when executed at run-time) and 
a list of corresponding registers. One of two different methods is used 
depending on the number of values involved. Method 2 produces better code (this 
is obvious only when one understands the properties of the MacLISP compiler which 
will compile the MacLISP code into PDP-10 machine language). Unfortunately, it 
happened that when RABBIT was written there was a bug in the MacLISP compiler 
such that it often found itself unable to compile the code generated by Method 2 
Moreover, the primary maintainer of the MacLISP compiler was on leave for a year. 
For this reason Method 3 was invented, which always works, but is considerably 
more expensive in terms of the PDP-10 code produced. (I concerned myself with 
this low level of detail only for this routine, because the code it produces is 
central to the whole code generator, and so its efficiency is of the greatest 
importance.) In order to achieve the best code, I determined empirically that 
Method 2 never failed as long as fewer than five values were involved. I might 
also add that a Method 1 was once used, which happened to provoke a different bug 
in the MacLISP compiler; Method 2 was invented in an attempt to circumvent that 
first bug! Now that the maintainer of the MacLISP compiler (Jon L White) has 
returned, it may soon be possible to remove Method 3 from RABBIT; but I think 
this story serves as an excellent example of pragmatic engineering to get around 
immediate obstacles (also known as a "kludge"). 
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002 ;;; GENERATE PARALLEL SETQ' ING OF REGISTERS TO ARGS . 

003 ;;; RETURNS A LIST OF THINGS; ONE WRITES ?(PSETQIFY ...) WITHIN V 
004 

005 (DEFINE PSETQIFY 

006 (LAMBDA (ARGS REGISTERS) 

00 7 (IF (< (LENGTH ARGS) 5) 

008 (PSETQIFY-METHOD-2 ARGS REGISTERS) 

009 (PSETQIFY-METHOD-3 ARGS REGISTERS)))) 
010 

Oil 

012 (DEFINE PSETQIFY-METHOD-2 

013 (LAMBDA (ARGS REGISTERS) 

014 (LABELS ((PSETQ1 

015 (LAMBDA (A REGS QVARS SETQS USED) 

016 (IF (NULL A) 

017 (IF (NULL SETQS) 

018 NIL 

019 (IF (NULL (CDR SETQS)) 

020 "((SETQ .(CADAR SETQS) ,(CAR USED))) 

021 ;; IMPORTANT: DO NOT NREVERSE THE SETQS! 

022 ;;MAKES MACLISP COMPILER WIN BETTER. 

023 "(((LAMBDA .(NREVERSE QVARS) &SETQS) 

024 UNREVERSE USED))))) 

025 (IF (EQ (CAR A) (CAR REGS)) ;AVOID USELESS SETQ'S 

026 (PSETQ1 (CDR A) 

027 (CDR REGS) 
28 QVARS 

029 SETQS 

030 USED) 

031 ((LAMBDA (QV) 

032 (PSETQ1 (CDR A) 

033 (CDR REGS) 

034 (CONS QV QVARS) 

035 (CONS "(SETQ .(CAR REGS) ,QV) SETQS) 

036 (CONS (CAR A) USED))) 

037 (GENTEMP 'Q))))))) 

038 (PSETQ1 ARGS REGISTERS NIL NIL NIL)))) 
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Method 2 essentially uses local MacLISP LAMBDA variables to temporarily 
name the values before assignment to the registers, while Method 3 uses global 
variables. (Nethod 2 produces better code because the MacLISP compiler can 
allocate the local variables on a stack, one by one, and then pop them off in 
reverse order into the "registers".) Both methods perform two peephole 
optimizations: (1) If a value-register pair calls for setting the register to 
its own contents, that SETQ is eliminated. (2) If this elimination reduces the 
number of SETQs to zero or one, then NIL or a single SETQ is produced, rather 
than the more complicated and general piece of code. 

As examples, (PSETQIFY '(-12- -12- (CDR -13-)) '(-11- -12- -13-)) would 
produce: 

((LAMBDA (Q-43 Q-44) 

(SETQ -13- Q-44) 

(SETQ -11- Q-43)) 
-12- 
(CDR -13-)) 

(note that (SETQ -12- -12-) was eliminated), and 

(PSETQIFY '(-23- -21- -24- -25- -22-) '(-21- -22- -23- -24- -25-)) 
would produce: 

(PROG () (DECLARE (SPECIAL -21-TEMP -22-TEMP -23-TEMP -24— TEMP -25— TEMP) 
(SETQ -25-TEMP -22-) 
(SETQ -24--TEMP -25-) 
(SETQ -23--TEMP -24-) 
(SETQ -22--TEMP -21-) 
(SETQ -21--TEMP -23-) 
(SETQ -25- -25— TEMP) 
(SETQ -24- -24--TEMP) 
(SETQ -23- -23--TEMP) 
(SETQ -ZZ- -22— TEMP) 
(SETQ -21- -21— TEMP)) 

The only reason for using PROG is so that the DECLARE form could be included for 
the benefit of the MacLISP compiler. 

The examples here are slightly incorrect; PSETQIFY actually produces a 
list of MacLISP forms, so that when no SETQs are produced the resulting NIL is 
interpreted as no code at all. 

In principle the elimination of redundant SETQs should be performed 
before choosing which method to use, so that there will be a maximal chance of 
using the more efficient Method 2. I chose not to only so that the two methods 
would remain distinct pieces of code and thus easily replaceable. 
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002 (DEFINE PSETQIFY-METHOD-3 

003 (LAMBDA (ARGS REGISTERS) 

004 (LABELS ((PSETQ1 

005 (LAMBDA (A REGS QVARS SETQS USED) 

006 (IF (NULL A) 

007 (IF ( N "LL SETQS) 

008 NIL 

009 (IF (NULL (COR SETQS)) 

010 "((SETQ ,(CADAR SETQS) .(CADDR (CAR USED)))) 

011 "((PROG () (DECLARE (SPECIAL gQVARS)) BUSED PSETQS) ))) 

012 (IF (EQ (CAR A) (CAR REGS)) ;AVOID USELESS SETQ'S 

013 (PSETQ1 (COR A) 

014 (CDR REGS) 

015 QVARS 

016 SETQS 

017 USED) 

018 ((LAMBDA (QV) 

019 (PSETQ1 (COR A) 

020 (CDR REGS) 

2i (CONS QV QVARS) 

022 (CONS "(SETQ .(CAR REGS) .QV) SETQS) 

023 (CONS "(SETQ .QV .(CAR A)) USED))) 

024 (CATENATE (CAR REGS) * |-TEMP| ))))))) 

025 (PSETQ1 ARGS REGISTERS NIL NIL NIL)))) 
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*• ^cuw PSETQ .~ ARGS iS a handy routine whi ch calls PSETQ-ARGS-ENV with an ENVADJ 
of **ENV**, knowing that later the redundant "(SETQ **ENV** **ENV**)" will be 
eliminated. ' 

PSETQ-ARGS-ENV takes a set of arguments and an environment adjustment, 
and ""ranges to call PSETQIFY so as to set up the standard argument registers. 

**nuSrf of ap°p W »vr«J S d °. ne d T" dS ° n Whether the number of ^guments exceeds 
**NUNBER-OF-ARG-REGS**; if it does, then a list of the arguments (except the 

**ARr 1 NM?MT°pPrT 1 ,TPn P ,lt Sed in ** 0NE *** **ENV + CONT + ARG-REGS** is the same as 
^ARGUMENT-REGISTERS**, except that both the names **ENV** and **CONT** are 
adjoined to the front. It can be quite critical that **ENV** and the argument 
registers be assigned to in parallel, because the computation of the argument 
values may well refer to variables in the environment, whereas the environment 
adjustment may be taken from a closure residing in one of the argument registers. 

+u PSETQ " TENPS is similar to PSETQ-ARGS-ENV, but is used on registers other 
tnan the standard argument-passing registers. It takes ARGS and ENVADJ as 
before but also a depth DEP which is the number of the first register to be 
assigned to TEMPLOC is used to generate the register names, then **ENV** is 
tacked on and PSETQIFY does the real work. 

r»n« ak.a?v^ ALYZE *u a A simple l0 °P wnlcn ma P s over a list of cnode-trees and 
calls ANAL\ZE on each. A list of the results returned by ANALYZE is given to C. 
J^l/f." chai " ed _ through the calls to ANALYZE, so that all functions to be 
compiled later will have been accumulated properly. 
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002 (DEFINE PSETQ-ARGS 

003 (LAMBDA (ARGS) 

004 (PSETQ-ARGS-ENV ARGS '**ENV**))) 
005 

006 (DEFINE PSETQ-ARGS-ENV 

00 7 (LAMBDA (ARGS ENVADJ) 

008 (IF (> (LENGTH ARGS) (+ **NUMBER-OF-ARG-REGS** 1)) 

009 (PSETQIFY (LIST ENVADJ (CAR ARGS) (CONS 'LIST (CDR ARGS))) 

010 **ENV+CONT+ARG-REGS**) 

Oil (PSETQIFY (CONS ENVADJ ARGS) **ENV+CONT+ARG-REGS**) ) )) 

012 

013 (DEFINE PSETQ-TEMPS 

014 (LAMBDA (ARGS DEP ENVADJ) 

015 (DO ((A ARGS (CDR A)) 

016 (J DEP (+ J 1)) 

017 (R NIL (CONS (TEMPLOC J) R))) 

018 ((NULL A) 

019 (PSETQIFY (CONS ENVADJ ARGS) 

020 (CONS «**ENV** (NREVERSE R))))))) 
021 

022 

023 (DEFINE MAPANALYZE 

24 (LAMBDA (FLIST RNL PROGNAME BLOCKFNS FNS C) 

025 (LABELS ((LOOP 

026 (LAMBDA (F Z FNS) 

027 (IF (NULL F) 

028 (C (NREVERSE Z) FNS) 

029 (ANALYZE (CAR F) 

030 RNL 

031 PROGNAME 
32 BLOCKFNS 

033 FNS 

034 (LAMBDA (STUFF FNS) 

035 (LOOP (CDR F) 

036 (CONS STUFF Z) 

037 FNS))))))) 

038 (LOOP FLIST NIL FNS)))) 
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ANALYZE is the routine called to compile a piece of code which is 
expected to produce a value. ANALYZE itself is primarily a dispatch to 
specialists. For the "simple" case of a "trivial" form, TRIVIALIZE is used to 
generate the code. For the simple case of a CVARIABLE, ANALYZE simply uses 
LOOKUP ICATE to get the code for the variable reference. 

ANALYZE-CLAMBDA has three cases based on FNP. For type NIL, code is 
generated to create a full closure of the form (CBETA <value of progname> <tag> . 
<environment>). CONS-CLOSEREFS generates the code to add the CLOSEREFS to the 
existing consed environment for making this closure. For type EZCLOSE, just the 
environment part is created, again using CONS-CLOSEREFS. For type NOCLOSE, the 
generated "code" should never be referenced at all -- it is not even passed as an 
argument as such -- and so a little message to the debugger is returned as the 
"code", which of course must not appear in the final code for the module. For 
all three cases, the code for the function is added to the FNS list for later 
compilation. 

ANALYZE-CONTINUATION is essentially identical to ANALYZE-CLAMBDA. 
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002 (DEFINE ANALYZE 

003 (LAMBDA (CNODE RNL PROGNAME BLOCKFNS FNS C) 

004 (LET ((CFM (CNODE\CFORM CNODE))) 

005 (EQCASE (TYPE CFM) 

006 (TRIVIAL 

007 (C (TRIVIALIZE (TRIVIAL\NO0E CFM) RNL) FNS)) 

008 (CVARIABLE 

009 (C (LOOKUPICATE (CVARIABLEWAR CFM) RNL) FNS)) 

010 (CLAMBDA 

Oil ( ANALYZE -CLAMBOA CNODE RNL PROGNAME BLOCKFNS FNS C CFM)) 

012 (CONTINUATION 

013 (ANALYZE -CONTINUATION CNODE RNL PROGNAME BLOCKFNS FNS C CFM)) 

014 (CIF 

015 (ANALYZE-CIF CNODE RNL PROGNAME BLOCKFNS FNS C CFM)) 

016 (CLABELS 

017 (ANALYZE-CLABELS CNODE RNL PROGNAME BLOCKFNS FNS C CFM)) 

018 ( COMBINATION 

019 (ANALYZE-CCOMBINATION CNODE RNL PROGNAME BLOCKFNS FNS C CFM)) 

020 (RETURN 

021 (ANALYZE-RETURN CNODE RNL PROGNAME BLOCKFNS FNS C CFM)))))) 
022 

023 (DEFINE ANALYZE -CLAMBDA 

024 (LAMBDA (CNODE RNL PROGNAME BLOCKFNS FNS C CFM) 

025 (EQCASE (CLAMBDA\FNP CFM) 

026 (NIL 

027 (C "(CONS 'CBETA 

028 (CONS .PROGNAME 

029 (CONS ' ,(CLAMBDA\NAME CFM) 

030 .(CONS-CLOSEREFS (CLAMBDA\CLOSEREFS CFM) 

031 RNL)))) 

032 (CONS (LIST PROGNAME CNOOE NIL) FNS))) 

033 (EZCLOSE 

034 (C (CONS-CLOSEREFS (CLAMBDA\CLOSEREFS CFM) RNL) 

035 (CONS (LIST PROGNAME CNODE NIL) FNS))) 
36 (NOCLOSE 

037 (C 'iShctuldn't ever be seen - NOCLOSE CLAMBDA| 

38 (CONS (LIST PROGNAME CNODE RNL) FNS)))))) 
039 

040 (DEFINE ANALYZE-CONTINUATION 

041 (LAMBDA (CNODE RNL PROGNAME BLOCKFNS FNS C CFM) 

042 (EQCASE (CONTINUATION\FNP CFM) 

043 (NIL 

044 (C "(CONS 'CBETA 

045 (CONS .PROGNAME 

046 (CONS ',(CONTINUATION\NAME CFM) 

047 .(CONS-CLOSEREFS (CONTINUATION\CLOSEREFS CFM) 

048 RNL)))) 

049 (CONS (LIST PROGNAME CNODE NIL) FNS))) 

050 (EZCLOSE 

051 (C (CONS-CLOSEREFS (CONTINUATION\CLOSEREFS CFM) RNL) 

052 (CONS (LIST PROGNAME CNODE NIL) FNS))) 

053 (NOCLOSE 

054 (C '|Shou1dn't ever be seen - NOCLOSE CONTINUATION! 

055 (CONS (LIST PROGNAME CNODE RNL) FNS)))))) 
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ANALYZE-CIF merely calls ANALYZE recursively on the predicate, 
consequent, and alternative, and then uses CONDICATE to construct a MacLISP COND 
form. 

ANALYZE-CLABELS calls ANALYZE recursively on the body of the CLABELS, and 
then calls PRODUCE-LABELS to do the rest. (Unlike the other PRODUCE- functions, 
PRODUCE-LABELS does not depend on generating code which does not produce a value. 
It accepts an already-compiled body, and builds around that the framework for 
constructing the mutually referent functions. If the body was compiled using 
COMP-BODY, then the code generated by PRODUCE-LABELS will produce no value; but 
if the body was compiled using ANALYZE, then it will produce a value.) 
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002 (DEFINE ANALYZE-CIF 

003 (LAMBDA (CNODE RNL PROGNAME BLOCKFNS FNS C CFM) 

004 (ANALYZE (CIF\PRED CFM) 

005 RNL 

006 PROGNAME 
00 7 BLOCKFNS 

008 FNS 

009 (LAMBDA (PRED FNS) 

010 (ANALYZE (CIF\CON CFM) 
Oil RNL 

012 PROGNAME 

013 BLOCKFNS 

014 FNS 

015 (LAMBDA (CON FNS) 

016 (ANALYZE (CIF\ALT CFM) 

017 RNL 

018 PROGNAME 

019 BLOCKFNS 

020 FNS 

021 (LAMBDA (ALT FNS) 

022 (C (CONDICATE PRED CON ALT) 

023 FNS))))))))) 
024 

025 (DEFINE ANALYZE-CLABELS 

026 (LAMBDA (CNODE RNL PROGNAME BLOCKFNS FNS C CFM) 

027 (ANALYZE (CLABELS\BODY CFM) 

028 (ENVCARCDR (APPEND (CLABELS\FNENV CFM) 

029 (CLABELSNCONSENV CFM)) 

030 RNL) 

031 PROGNAME 

032 BLOCKFNS 

033 FNS 

034 (LAMBDA (LBOD FNS) 

035 (PRODUCE-LABELS CNODE LBOD RNL PROGNAME BLOCKFNS FNS C))))) 
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ANALYZE-CCONBINATION requires the function to be a CLAMBDA (for if it 
were not, then something too complicated for continuation-passing style is going 
on). ANALYZE is called on the body of the CLAMBDA, and then on all the arguments 
(using NAPANALYZE) ; finally LAMBDACATE is used to generate the code. 
(LAMBDACATE is much like PRODUCE-LABELS, in that it is handed a body, and whether 
the generated code produces a value depends only on whether the body does.) 

ANALYZE-RETURN is essentially just like ANALYZE-CCOMBINATION. 
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002 (DEFINE 

003 ANALYZE-CCOMBINATION 

004 (LAMBDA (CNODE RNL PROGNAME BLOCKFNS FNS C CFM) 

005 (LET ((FN (CNODENCFORM (CAR (CCOMBINATION\ARGS CFM))))) 

006 (IF (EQ (TYPE FN) 'CLAMBDA) 

00 7 (ANALYZE (CLAMBDA\BODY FN) 

008 (ENVCARCDR (CLAMB0A\ASETVARS FN) 

009 (REGSLIST FN T (ENVCARCDR (CLAMBDA\CONSENV FN) RNL))) 

010 PROGNAME 
Oil BLOCKFNS 

012 FNS 

013 (LAMBDA (BODY FNS) 

014 (MAPANALYZE 

015 (CDR (CCOMBINATION\ARGS CFM)) 

016 RNL 

017 PROGNAME 

018 BLOCKFNS 

019 FNS 

020 (LAMBDA (ARGS FNS) 

021 (C (LAMBDACATE (CLAMBDAWARS FN) 

022 (CLAMBDANTVARS FN) 

023 (CLAMBDAXDEP FN) 

024 ARGS 

025 (REMARK-ON (CAR (CCOMBINATION\ARGS CFM))) 

026 '**ENV** 

027 (SET-UP-ASETVARS BODY 

028 (CLAMBDA\ASETVARS FN) 

029 (REGSLIST FN NIL NIL))) 

030 FNS))))) 

031 (ERROR • |Non-trivial Function in ANALYZE-CCOMBINATION | CNODE 'FAIL-ACT))))) 
032 

033 (DEFINE ANALYZE-RETURN 

034 (LAMBDA (CNODE RNL PROGNAME BLOCKFNS FNS C CFM) 

035 (LET ((FN (CNODENCFORM (RETURN\CONT CFM)))) 

036 (IF (EQ (TYPE FN) 'CONTINUATION) 

037 (ANALYZE (CONTINUATION\BODY FN) 

038 (IF (CONTINUATIONYTVARS FN) 

039 (CONS (CONS (CAR (CONTINUATION\TVARS FN)) 

040 (TEMPLOC (CONTINUATION\DEP FN))) 

041 (ENVCARCDR (CONTINUATIONNCONSENV FN) RNL)) 

042 (ENVCARCDR (CONTINUATION\CONSENV FN) RNL)) 

043 PROGNAME 

044 BLOCKFNS 

045 FNS 

046 (LAMBDA (BODY FNS) 

04 7 (ANALYZE (RETURNWAL CFM) 

048 RNL 

049 PROGNAME 
50 BLOCKFNS 

051 FNS 

052 (LAMBDA (ARG FNS) 

053 (C (LAMBDACATE 

054 (LIST (CONTINUATIONWAR FN)) 

055 (CONTINUATIONNTVARS FN) 

056 (CONTINUATIONNDEP FN) 

057 (LIST ARG) 

058 (REMARK-ON (RETURN\CONT CFM)) 

059 «**ENV** 

060 BODY) 

061 FNS))))) 

062 (ERROR ' |Non-trivial Function in ANALYZE-RETURN | CNODE 'FAIL-ACT))))) 
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LOOKUPICATE (I make no apology for the choice of the name of this or any 
other function; suffice it to say that a function named LOOKUP already existed 
in the SCHEME interpreter) takes a variable name VAR and a rename list RNL, and 
returns a piece of MacLISP code for referring to that variable. If an entry is 
in RNL for the variable, that entry contains the desired code. Otherwise a 
global variable reference must be constructed. This will simply be a reference 
to the MacLISP variable, unless it is the name of a TRIVFN. In this case a GETL 
form is constructed. (This is a big kludge which does not always work, and is 
done this way as a result of a rather unclean hack in the SCHEME interpreter 
which interfaces MacLISP functions with SCHEME functions.) 

CONS-CLOSEREFS constructs a piece of MacLISP code which will cons onto 
the value of **ENV** all the variables in the set CLOSEREFS. This is a simple 
loop which uses LOOKUPICATE to generate code, and constructs a chain of calls to 
CONS. For example, (CONS-CLOSEREFS '(A B C) NIL) would produce: 

(CONS A (CONS B (CONS C **ENV**))) 

Notice the use of REVERSE to preserve an order assumed by other routines. 

OUTPUT-ASET takes two pieces of code: VARREF, which refers to a 
variable, and BODY, which produces a value to be assigned to the variable. From 
the form of VARREF a means of assigning to the variable is deduced. (This 
implies that OUTPUT-ASET knows about all forms of code which might possibly be 
returned by LOOKUPICATE and, a fortiori , which might appear in a RNL.) For 
example, if the reference is (CADR (CDDDDR **ENV**)), OUTPUT-ASET would generate 
(RPLACA (CDR (CDDDDR **ENV**)) <body>). 
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002 (DEFINE LOOKUPICATE ~ 

003 (LAMBDA (VAR RNL) 

004 t ((LAMBDA (SLOT) 

005 * (IF SLOT (CDR SLOT) 

006 (IF (TRIVFN VAR) 

007 "(GETL \VAR '(EXPR SUBR LSUBR)) 

008 VAR))) 

009 (ASSQ VAR RNL)))) 
010 

Oil (DEFINE CONS-CLOSEREFS 

012 (LAMBDA (CLOSEREFS RNL) 

013 (DO ((CR (REVERSE CLOSEREFS) (CDR CR)) 

014 (X '**ENV** "(CONS .(LOOKUPICATE (CAR CR) RNL) ,X))) 

015 ((NULL CR) X)))) 
016 

017 (DEFINE OUTPUT-ASET 

018 (LAMBDA (VARREF BODY) 

019 (COND ((ATOM VARREF) 

020 "(SETQ .VARREF .BODY)) 

021 ((EQ (CAR VARREF) 'CAR) 

022 "(CAR (RPLACA .(CADR VARREF) .BODY))) 

023 ((EQ (CAR VARREF) 'CADR) 

024 "(CAR (RPLACA (CDR .(CADR VARREF)) .BODY))) 

025 .((EQ (CAR VARREF) 'CADDR) 

026 "(CAR (RPLACA (CDDR .(CADR VARREF)) .BODY))) 

027 ((EQ (CAR VARREF) 'CADDDR) 

028 "(CAR (RPLACA (CDDDR .(CADR VARREF)) .BODY))) 

029 (T (ERROR • | Unknown ASET discipline - OUTPUT-ASET| VARREF 'FAIL-ACT))))) 
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CONDICATE takes the three conponents of an IF conditional, and constructs 
a MacLISP COND form. It also performs a simple peephole optimization: 

(COND (a b) 

(T (COND (c d) ...))) 

becomes: 

(COND (a b) (c d) ...) 

Also, DEPROGNIFY is used to take advantage of the fact that MacLISP COND clauses 
are implicitly PROGN forms. Thus: 

(CONDICATE '(NULL X) '(PROGN (PRINT X) Y) '(COND ((NULL Y) X) (T FOO))) 

would produce: 

(COND ((NULL X) (PRINT X) Y) ((NULL Y) X) (T FOO)) 

DECARCDRATE is a peephole optimizer which attempts to collapse CAR/CDR 
chains in a piece of MacLISP code to make it more readable. For example: 

(CAR (CDR (CDR (CAR (CDR (CAR (CDR (CDR (CDR (CDR X)))))))))) 

would become: 

(CADDR (CADR (CADDDR (CDR X)))) 

The arbitrary heuristic is that "A" should appear only initially in a "C...R" 
composition. DECARCDRATE also knows that MacLISP ordinarily has defined CAR/CDR 
compositions up to four long. 
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;;; CONDICATE TURNS AN IF INTO A COND; IN SO DOING IT TRIES TO MAKE THE RESULT "PRETTY. 

(DEFINE CONDICATE 

(LAMBDA (PREO CON ALT) 

(IF (OR (ATOM ALT) (NOT (EQ (CAR ALT) 'COND))) 
"(COND (,PRED gJDEPROGNIFY CON)) 

(T ?(DEPROGNIFY ALT))) 
"(COND j.PRED §(DEPROGNIFY CON)) 
?(CDR ALT))))) 

;;; DECARCDRATE MAKES CAR-COR CHAINS PRETTIER. 

(DEFINE DECARCDRATE 
(LAMBDA (X) 

(COND ((ATOM X) X) 

((EQ (CAR X) 'CAR) 
(IF (ATOM (CADR X)) 
X 
(LET ((Y (DECARCDRATE (CADR X)))) 

(COND ((EQ (CAR Y) 'CAR) "(CAAR .(CADR Y))) 
((EQ (CAR Y) 'CDR) "(CADR .(CADR Y))) 
((EQ (CAR Y) 'CDDR) "(CADDR .(CADR Y))) 
((EQ (CAR Y) 'CDDDR) "(CADDDR ,(CADR Y))) 
(T "(CAR .Y)))))) 
((EQ (CAR X) 'CDR) 
(IF (ATOM (CADR X)) 
X 
(LET ((Y (DECARCDRATE (CADR X)))) 

(COND ((EQ (CAR Y) 'CDR) "(CDDR .(CADR Y))) 

((EQ (CAR Y) 'CDDR) "(CDDDR .(CADR Y))) 
((EQ (CAR Y) 'CDDDR) "(CDDDDR .(CADR Y))) 
(T "(CDR ,Y)))))) 
(T X)))) 
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TRIVIALIZE is the version of ANALYZE which handles trivial forms. Recall 
that these are represented as pass-1 node-trees rather than as pass-Z cnode- 
trees. The task of TRIVIALIZE is to take such a node-tree and generate value- 
producing code. Recall that the subforms of a trivial form must themselves be 
trivial. 

For a CONSTANT, a quoted copy of the value of the constant is generated. 

For a VARIABLE, a reference to the variable is generated using 
LOOKUP I CATE. 

For an IF, the components are recursively given to TRIVIALIZE and then 
CONDICATE is used to generate a MacLISP COND form. 

For an ASET, a reference to the ASET variable is generated using 
LOOKUPICATE, and code for the body is generated by calling TRIVIALIZE 
recursively; then OUTPUT-ASET generates the code for the ASET. 

For a COMBINATION, the function must be either a TRIVFN or a LAJ1BDA- 
expression. For the former, a simple MacLISP function call is generated, after 
generating code for all the arguments. For the latter, TRIV-LAMBDACATE is 
invoked after generating code for the arguments and the LAMBDA body. 



TRIV-LAMBDACATE is, so to speak, a trivial version of LAMBDACATE. The 
arguments are divided into two classes, those which are referenced and those 
which are not (the possibility of a referenced argument which is a KNOWN- FUNCTION 
cannot arise). When this is done, a MacLISP ((LAMBDA ...) ...) form is 
generated, preceded by any unreferenced arguments (which presumably have side- 
effects). For example: 

(TRIV-LAMBDACATE '(VI V2 V3) 

'((CAR X) (PRINT Y) (CDR Z)) 
'(PROGN (PRINT VI) (LIST VI V3))) 

ought to produce: 

(PROGN (PRINT Y) 

((LAMBDA (VI V3) 



(COMMENT (VARS 
(PRINT VI) 
(LIST VI V3)) 



= (A C))) 



(CAR X) 
(CDR Z))) 

Note that a MacLISP LAMBDA body is an implicit PROGN. TRIV-LAMBDACATE also takes 
advantage of the ability to arbitrarily reorder the execution of arguments to a 
combination. 
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002 (DEFINE TRIVIALIZE 

003 (LAMBDA (NO0E RNL ) 

004 (LET ((FM (NODE\FORM NODE))) 

005 (EQCASE (TYPE FM) 

006 (CONSTANT •• ,(CONSTANT\VALUE FM)) 

007 (VARIABLE (LOOKUPICATE (VARIABLEWAR FM) RNL)) 

008 (IF (CONDICATE (TRIVIALIZE (IF\PRED FM) RNL) 

009 (TRIVIALIZE ( IF\CON FM) RNL) 

010 (TRIVIALIZE (IF\ALT FM) RNL))) 
Oil (ASET 

012 (OUTPUT-ASET (LOOKUPICATE (ASETWAR FM) RNL) 

013 (TRIVIALIZE (ASET\BODY FM) RNL))) 

014 (COMBINATION 

015 (LET ((ARGS (COMBINATION\ARGS FM))) 

016 (LET ((FN (NODENFORM (CAR ARGS)))) 

017 (IF (ANO (EQ (TYPE FN) 'VARIABLE) 

018 (VARIABLEVGLOBALP FN) 

019 (TRIVFN (VARIABLEWAR FN))) 

020 (CONS (VARIABLEWAR FN) 

021 (AMAPCAR (LAMBDA (A) (TRIVIALIZE A RNL)) 

022 (CDR ARGS))) 

023 (IF (EQ (TYPE FN) 'LAMBDA) 

024 (TRIV-LAMBDACATE 

025 (LAMBDA WARS FN) 

026 (AMAPCAR (LAMBDA (A) (TRIVIALIZE A RNL)) 
02 7 (CDR ARGS)) 

028 (TRIVIALIZE (LAMBDAXBODY FN) RNL)) 

029 (ERROR ' |Strange Trivial Function - TRIVIALIZE| 

030 NODE 

031 -FAIL-ACT)))))))))) 
032 

033 (DEFINE TRIV-LAMBDACATE 

034 (LAMBDA (VARS ARGS BODY) 

035 (LABELS ((LOOP 

036 (LAMBDA (V A REALVARS REALARGS EFFARGS) 

037 (IF (NULL A) 

038 (LET ((RV (NREVERSE REALVARS))) 

039 (OR (NULL V) 

040 (ERROR '|We blew it in TRIV-LAMBOACATE| V 'FAIL-ACT)) 

041 (LET ((B (IF RV 

042 "((LAMBDA .RV 

043 (COMMENT 

044 (VARS - , (MAP-USER-NAMES RV))) 

045 8(DEPROGNIFY BODY)) 

046 8( NREVERSE REALARGS)) 
04 7 BODY))) 

048 (IF EFFARGS 

049 "(PROGN PEFFARGS *(DEPROGNIFY B)) 

050 B))) 

051 (IF (OR (GET (CAR V) 'READ-REFS) 

052 (GET (CAR V) 'WRITE-REFS) ) 

053 (LOOP (CDR V) 

054 (CDR A) 

055 (CONS (CAR V) REALVARS) 

056 (CONS (CAR A) REALARGS) 

057 EFFARGS) 

058 (LOOP (CDR V) 

059 (CDR A) 

060 REALVARS 

061 REALARGS 

062 (CONS (CAR A) EFFARGS))))))) 

063 (LOOP VARS ARGS NIL NIL NIL)))) 
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We have examined the entire code generator, and now turn to high-level 
control routines. COMPILATE-ONE-FUNCTION is the highesWevel entry to the code 
generator, called by COMPILE. It takes a code-tree and the user-name for the 
function, and returns a complete piece of MacLISP code constituting a module for 
the user function. It generates a global name for use as the module name 
(PROGNANE), and invokes COMP I LATE-LOOP (which really ought to have been a LABELS 
function, but was too big to fit on the paper that way). The last argument is a 
list of two MacLISP forms; one causes a SCHEME compiled closure form (a CBETA 
list) to be put in the value cell of the user-name, so that it will be a globally 
defined SCHEME function, and the other creates a property linking the PROGNANE 
with the USERNAME for debugging purposes. 

COMPILATE-LOOP repeatedly calls COMPILATE, giving it the next function on 
the FNS list. Of course, the invocation of COMPILATE may cause new entries to 
appear on the FNS list. COMPILATE-LOOP iterates until the FNS list converges to 
emptiness. As it does so it takes each piece of generated code and strings it 
together as PROGBODY. It also calculates in TMAX the maximum over all MAXDEP 
slots; this is the only place where the MAXDEP slot is ever used. 

When FNS is exhausted, a module is constructed. This contains a comment, 
a MacLISP DEFUN form for defining a MacLISP function, a SETQ form to put the SUBR 
pointer in the value cell of the PROGNAME (for the benefit of code which creates 
CBETA forms), and extra "stuff". TMAX is used to generate a list of all 
temporary variables used in the module; a MacLISP SPECIAL declaration is created 
to advise the MacLISP compiler. 

USED-TEMPLOCS takes a TMAX value and generates the names of all temporary 
registers (whose names are of the form -nn-; standard argument registers are not 
included) up to that number. 
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(DEFINE COMPILATE-ONE-FUNCTION ICOMPLICATE-ONE-FUNCTION? ~ 

(LAMBDA (CNODE USERNAME) 

(LET ((PROGNAME (GEN-GLOBAL-NAME))) 
(COMPILATE-LOOP USERNAME 
PROGNAME 

(LIST (LIST USERNAME CNODE)) 
(LIST (LIST PROGNAME CNODE NIL)) 
NIL 

(LIST "(SETQ .USERNAME 

(LIST 'CBETA 

.PROGNAME 

' ,(CLAMBDA\NAME (CNODE\CFORM CNODE)))) 
"(DEFPROP .PROGNAME .USERNAME USER-FUNCTION)))))) 

(DEFINE COMPILATE-LOOP 

(LAMBDA (USERNAME PROGNAME BLOCKFNS FNS PROGBODY TMAX STUFF) 
(IE (NULL FNS) 

"(PROGN 'COMPILE 

(COMMENT MODULE FOR FUNCTION .USERNAME) 
(DEFUN .PROGNAME () 
(PROG () 

(DECLARE (SPECIAL .PROGNAME 8(USED-TEMPLOCS TMAX))) 
(GO (PROG2 NIL 

(CAR **ENV**) 

(SETQ **ENV** (CDR **ENV**)))) 
UNREVERSE PROGBOOY))) 
(SETQ .PROGNAME (GET • .PROGNAME 'SUBR)) 
8STUFF) 
(COMPILATE (CAR (CAR FNS)) 
(CADR (CAR FNS)) 
(CADDR (CAR FNS)) 
BLOCKFNS 
(CDR FNS) 
(LAMBDA (CODE NEWFNS) 

(LET ((CFM (CNODENCFORM (CADR (CAR FNS))))) 
(COMPILATE-LOOP 
USERNAME 
PROGNAME 
BLOCKFNS 
NEWFNS 
(NCONC (REVERSE (DEPROGNIFY1 CODE T)) 

(CONS (REMARK-ON (CADR (CAR FNS))) 
(CONS (EOCASE (TYPE CFM) 
(CLAMBOA 

(CLAMBDA\NAME CFM)) 
(CONTINUATION 
( CONT I NUAT IONXNAME CFM ) ) ) 
PROGBODY))) 
(MAX TMAX 

(EOCASE (TYPE CFM) 
(CLAMBDA 

(CLAMBDANMAXDEP CFM)) 
(CONTINUATION 
(CONTINUATIONXMAXDEP CFM)))) 



STUFF))))))) 



(DEFINE USED-TEMPLOCS 
(LAMBDA (N) 

(DO ((J (+ **NUMBER-OF-ARG-REGS** 1) (+ J 1)) 
(X NIL (CONS (TEMPLOC J) X))) 
((> J N) (NREVERSE X))))) 
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REMARK-ON takes a cnode for a CLAMBDA or CONTINUATION and generates a 
comment containing pertinent information about invoking that function. This 
comment will presumably be inserted in the output code to guide the debugging 
process. 

MAP-USER-NAMES takes a list of internal variable names and returns a list 
of the corresponding user names for the variables, as determined by the USER-NAME 
property. (If a variable is an internally generated one, e.g. for a 
continuation, then it will have no USER-NAME property, and the internal name 
itself is used. ) 
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002 (DEFINE REMARK-ON 

003 (LAMBOA (CNODE) 

004 (LET ((CFM (CNO0E\CFORM CNODE )) ) 

005 (LABELS ((REMARK1 

006 (LAMBOA (DEP FNP VARS ENV) 

00 7 "(COMMENT (DEPTH = ,DEP) 

008 (FNP = ,FNP) 

009 P( IF VARS "((VARS «= , (MAP-USER-NAMES VARS)))) 

010 P( IF ENV "((ENV = , (MAP-USER-NAMES ENV)))))))) 
Oil (EQCASE (TYPE CFM) 

012 (CLAMBDA 

013 (REMARK1 (CLAMBDAXDEP CFM) 

014 (CLAMBDAXFNP CFM) 

015 (IF (EQ (CLAMBDAXFNP CFM) 'NOCLOSE) 

016 (CLAMBDAXTVARS CFM) 

017 (CLAMBDAWARS CFM)) 

018 (APPEND (CLAMBDAXCLOSEREFS CFM) 

019 (CLAMBDAXCONSENV CFM)))) 

020 (CONTINUATION 

021 (REMARK1 (CONTINUATIONXDEP CFM) 

022 (CONTINUATIONXFNP CFM) 

023 NIL ;NEVER INTERESTING ANYWAY 

024 (APPEND (CONTINUATIONXCLOSEREFS CFM) 

025 (CONTINUATIONXCONSENV CFM))))))))) 
026 

027 

028 (DEFINE MAP-USER-NAMES 

029 (LAMBDA (VARS) 

030 (AMAPCAR (LAMBDA (X) (OR (GET X 'USER-NAME) X)) VARS))) 
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The next few pages contain routines which deal with files. COMFILE takes 
a file name, and compiles all the code in that file, producing an output file of 
MacLISP code suitable for giving to the MacLISP compiler. It also computes the 
CPU time required to compile the file. 

FN gets the given file name, processed and defaulted according to 
ITS/MacLISP standard conventions. RT and GCT save runtime and gc- run time 
information. 

IFILE and OFILE get MacLISP "file objects" created by the OPEN function, 
which opens the file specified by its first argument. (The output file names are 
initially " RABB OUTPUT", conforming to an ITS standard. These will later be 
renamed. ) 

*GLOBAL-GEN-PREFIX* is initialized as a function of the file name, to 
"directory=firstname" . This is to guarantee that the global symbols generated 
for two different compiled files of SCHEME code will not conflict should the two 
files be loaded into the same SCHEME together. (Notice the use of SYMEVAL. This 
is necessary because MacLISP allows names to be used in two different kinds of 
contexts, one meaning the "functional" value, and the other meaning the 
"variable" value. SCHEME does not make this distinction, and tries to make the 
functional value available, but does not do this consistently. This is a problem 
which results from a fundamental difference in semantics between SCHEME and 
MacLISP. For such variables as DEFAULTF and TYO, which in MacLISP are used for 
both purposes, it is necessary to use SYMEVAL to specify that the variable, 
rather than the function, is desired.) 

(SYMEVAL 'TYO) refers to the file object for the terminal; this is used 
to print out messages to the user while the file is being compiled. Various 
information is also printed to the file, including identification and a 
timestamp. The DECLARE form printed to the output file contains the names of the 
standard argument registers, and also **ENV**, **FUN**, and **NARGS**. (This is 
why USED-TEMPLOCS need not generate names of standard argument registers -- this 
single global declaration covers them.) The second DECLARE form defines to the 
MacLISP compiler a function called DISPLACE for obscure reasons having to do with 
the implementation of SCHEME macros. 

TRANSDUCE does the primary work of processing the input file. When it is 
done, another timestamp is printed to the output file, so that the real time 
consumed can be determined; then the runtime statistics are calculated and 
printed, along with the number of errors if any occurred. The output file is 
then renamed as "firstname LISP" and closed. The statistics message is returned 
so that it will be printed as the last message on the terminal. 
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002 (DEFINE COMFILE 

003 (LAMBDA (FNAME) 

004 (LET ((FN (DEFAULTF (MERGEF FNAME '(* >)))) 

005 (RT (RUNTIME)) 

006 (GCT (STATUS GCTIME))) 

007 (LET ((IFILE (OPEN FN 'IN)) 

008 (OFILE (OPEN (MERGEF • (_RABB_ OUTPUT) FN) 'OUT))) 

009 (SET 1 *GLOBAL-GEN-PREFIX* 

010 (CATENATE (CADAR (SYMEVAL 'DEFAULTF)) 
Oil >|.| 

012 (CADR (SYMEVAL 'DEFAULTF)))) 

013 (LET ((TN (NAMESTRING (TRUENAME IFILE)))) 

014 (PRINT "(COMMENT THIS IS THE RABBIT LISP CODE FOR ,TN) OFILE) 

015 (TIMESTAMP OFILE) 

016 (TERPRI OFILE) 

017 (TERPRI (SYMEVAL 'TYO)) 

018 (PRINC 'jjBeginning RABBIT compilation on | (SYMEVAL 'TYO)) 

019 (PRINC TN (SYMEVAL 'TYO))) 

020 (PRINT "(DECLARE (SPECIAL §**CONT+ARG-REGS** **ENV** **FUN** **NARGS**)) 

021 OFILE) 

022 (PRINT '(DECLARE (DEFUN DISPLACE (X Y) Y)) OFILE) 

023 (ASET' *TESTING* NIL) 

024 (ASET' *ERROR-COUNT* 0) 

025 (ASET' *ERROR-LIST* NIL) 

026 (TRANSDUCE IFILE 

027 OFILE 

028 (LIST NIL) 

029 (CATENATE '|INIT-| (CADR (TRUENAME IFILE)))) 

030 (TIMESTAMP OFILE) 

031 (LET ((X (*QUO (- (RUNTIME) RT) 1.0E6)) 

032 (Y (*QUO (- (STATUS GCTIME) GCT) 1.0E6))) 

033 (LET ((MSG "(COMPILE TIME: ,X SECONDS 

034 (GC TIME ,Y SECONDS) 

035 (NET ,(-$ X Y) SECONDS) 

36 a(IF (not (ZEROP *ERROR-COUNT* ) ) 

037 "((,*ERROR-COUNT* ERRORS)))))) 

038 (PRINT "(COMMENT ,MSG) OFILE) 
39 (RENAMEF OFILE 

040 (MERGEF (LIST (CADR FN) 'LISP) 

041 FN)) 

042 (CLOSE OFILE) 

043 MSG)))))) 
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TRANSDUCE processes forms from IFILE, one by one, calling PROCESS-FORM to 
do the real work on each one. PROCESS-FORM may print results on OFILE, and may 
also return a list of "random forms" to be saved. 

The business of "random forms" has to do with the fact that the file 
being compiled may contains forms which are not function definitions. The 
expectation is that when the file is loaded these forms will be evaluated during 
the loading process, and this is indeed true if the interpreter loads the 
original file of source forms. 

Now MacLISP provides a facility for evaluating random forms within a 
compiled file, but they are evaluated as MacLISP forms, not SCHEME forms. To get 
around this whole problem, I chose another solution. All the random forms in the 
file are accumulated, and then compiled as the body of a function named "INIT- 
firstname" . In this way, once the compiled code is loaded, the user is expected 
to say ( INIT-firstname) to cause the random forms to be evaluated. 

This idea would have worked perfectly except that files typically have a 
large number of random forms in them (macro definitions create one or two random 
forms as well as the definition of the macro-function). Putting all the random 
forms together in a single function often creates a function too big for RABBIT 
to compile, given PDP-10 memory limitiations. The four lines of code in 
TRANSDUCE for this have therefore been commented out with a ";" at the beginning 
of each line. 

The final solution was to compile each random form as its own function, 

and arrange for all these little functions to be chained; each one executes one 

random form and then calls the next. A call to INIT-firstname starts the chain 
going. 

This, then, is the purpose of the big DO loop in TRANSDUCE: to construct 
all the little functions for the random forms. The third argument to PROCESS- 
FORM may be NIL, which suppresses the printing of any messages on the terminal; 
this spares the user having to see tens or hundreds of uninteresting messages 
concerning the compilation of these initialization functions. However, so that 
the user will not be dismayed at the long pause, a message saying how many random 
forms there were is printed first. 

READIFY implements the MacLISP convention that if the value of the 
variable READ is non-nil, then that value is the read-in function to use; while 
if it is NIL, then the function READ is the read-in function. (This "hook" is 
the method by which CGOL works, for example.) 
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002 (DEFINE TRANSDUCE 

003 (LAMBDA (IFILE OFILE EOF INITNAME) 

004 (LABELS ((LOOP 

005 (LAMBDA (FORM RANDOM-FORMS) 

006 (IF (EQ FORM EOF) 

007 (DO ((X (GENTEMP INITNAME) (GENTEMP INITNAME)) 

008 # (Y NIL X) 

009 (Z RANDOM-FORMS (CDR Z))) 

010 ((NULL Z) 

OH (IF RANDOM-FORMS 

012 (PRINT "(.(LENGTH RANDOM-FORMS) 

0X3 RANDOM FORMS IN FILE TO COMPILE) 

01 4 (SYMEVAL 'TYO))) 

015 (IF Y (PROCESS-FORM "(DECLARE (SPECIAL ,Y)) 

016 0FILE 

017 T)) 

018 (PROCESS-FORM "(DEFINE .INITNAME 

019 (LAMBDA () .(IF Y (LIST Y) NIL))) 

020 OFILE 

021 T)) 

022 (IF Y (PROCESS-FORM "(DECLARE (SPECIAL ,Y)) 

023 0FILE 

024 NIL)) 

025 (PROCESS-FORM "(DEFINE ,X 

026 (LAMBDA () 

027 (BLOCK .(CAR Z) 

028 (IF / 

029 (LIST Y) 

030 NIL)))) 

031 0F | LE 

032 NIL)) 

(PROCESS -FORM 

"(DEFINE .INITNAME 

(LAMBDA () (BLOCK 8RANDOM- FORMS NIL NIL))) 
OFILE) 

037 (LET ((X (PROCESS-FORM FORM OFILE T))) 

038 (LOOP (READ IF Y IFILE EOF) (NCONC X RANDOM-FORMS))))))) 

039 (LOOP (READIFY IFILE EOF) NIL)))) 
040 

041 

042 (DEFINE READIFY ;FUNNY MACLISP CONVENTION - READIFY'LL DO THE JOB! 

043 (LAMBDA. (IFILE EOF) 

044 (IF (SYMEVAL 'READ) 

0*5 (APPLY (SYMEVAL 'READ) IFILE EOF) 

046 (READ IFILE EOF)))) 



033 
034 
035 
036 
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PROCESS-FORM takes a form, an output file, and a switch saying whether to 
be "noisy". The form is broken down into one of many special cases and processed 
accordingly. The returned value is a list of "random forms" for TRANSDUCE to 
save for later handling. 

An atom, while pretty useless, is transduced directly to the output file. 

A DEFINE-form, which defined a function to be compiled, is given to 
PROCESS-DEFINE-FORM. This is the interesting case, which we will discuss on the 
next page. 

A special hack handed down from MacLISP is that a form (PROGN 'COMPILE 
...) (and, for SCHEME, the analogous (BLOCK 'COMPILE ...)) should be treated as 
if all the subforms of the PROGN (or BLOCK) after the first should be processed 
as if they had been read as "top-level" forms from the file. This allows a macro 
call to generate more than one form to be compiled, for example. It is necessary 
to accumulate all the results of the calls to PROCESS-FORM so that they may be 
collectively returned. 

A PROCLAIM form contains a set of forms to be evaluated by RABBIT at 
compile time. The evaluation is accomplished by constructing a LAMBDA form and 
using the SCHEME primitive ENCLOSE to create a closure, and then invoking the 
closure. As a special service, the variable OFILE is made apparent to the 
evaluated form so that it can print information to the output file if desired. 

A DECLARE form is meant to be seen by the MacLISP compiler, and so it is 
passed on directly. 

A COMMENT form is simply eliminated. (It could be passed through 
directly with no harm.) 

A DEFUN form is passed directly, for compilation by the MacLISP compiler. 

A form which is a macro call is expanded and re-processed. As a special 
hack, those which are calls to DEFMAC, SCHMAC, or MACRO are also evaluated 
(MacLISP evaluation serves), so that the defined macro will be available for 
compiling calls to it later in the file. 

Any other form is considered "random", and is returned to TRANSDUCE 
provided *BUFFER-RANDOM-FORMS* is non-NIL. This switch is provided in case it is 
necessary to force a random form (e.g. an ALLOC form) to be output early in the 
file. In this case any random forms must be MacLISP-evaluable as well as SCHEME- 
evaluable. (This requirement is the reason RABBIT has random forms like "(SET' 
FOO ...)"; SETQ is unacceptable to SCHEME, while ASET' is unacceptable to 
MacLISP, but SET' happens to work in both languages for setting a global 
variable.) RABBIT itself sets *BUFFER-RANDOM-FORMS* to NIL on page 1 in a 
PROCLAIM form. 
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002 (SET' *OPTIMIZE* T) 
003 

004 (SET' *BUFFER-RANDOM-FORMS* T) 
005 

006 (DEFINE PROCESS-FORM 

007 (LAMBDA (FORM OFILE NOISYP) 

008 (COND ((ATOM FORM) 

009 (PRINT FORM OFILE) 

010 NIL) 

Oil ((EO (CAR FORM) 'DEFINE) 

012 (PROCESS-DEFINE-FORM FORM OFILE NOISYP) 

013 NIL) 

014 ((AND (MEMO (CAR FORM) '(BLOCK PROGN)) 

015 (EQUAL (CADR FORM) "COMPILE)) 

016 (DO ((F (CDDR FORM) (CDR F)) 

017 (Z NIL (NCONC Z (PROCESS-FORM (CAR F) OFILE NOISYP)))) 

018 ((NULL F) Z))) 

019 ((EQ (CAR FORM) 'PROCLAIM) 

020 (AMAPC (LAMBDA (X) ((ENCLOSE "(LAMBDA (OFILE) ,X)) OFILE)) 

021 (CDR FORM)) 

022 NIL) 

023 ((EO (CAR FORM) 'DECLARE) 

024 (PRINT FORM OFILE) 

025 NIL) 

026 ((EO (CAR FORM) 'COMMENT) 

027 NIL) 

028 ((EO (CAR FORM) 'DEFUN) 

029 (PRINT FORM OFILE) 

030 NIL) 

031 ((AND (ATOM (CAR FORM)) 

032 (EQ (GET (CAR FORM) 'AINT) 'AMACRO) 

033 (NOT (EQ (GET (CAR FORM) 'AMACRO) 'AFSUBR))) 

034 (IF (MEMO (CAR FORM) '(DEFMAC SCHMAC MACRO)) 

035 (EVAL FORM)) 

036 (PROCESS-FORM (MACRO-EXPAND FORM) OFILE NOISYP)) 

037 (T (COND (*BUFFER-RANDOM-FORMS* (LIST FORM)) 

038 (T (PRINT FORM OFILE) NIL)))))) 
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PROCESS-DEFINE-FORM disambiguates the three DEFINE formats permitted in 
SCHEME : 

(DEFINE FOO (LAMBDA (X Y) ...)) 
(DEFINE FOO (X Y) ...) 
(DEFINE (FOO X Y) ...) 

and constructs an appropriate LAMBDA-expression in standard form. 

PROCESS-DEFINITION takes this LAMBDA-expression and compiles it, after 
some error checks. It then cleans up, and if desired prints a message on the 
console to the effect that the function was compiled successfully. 

CLEANUP is used to clear out garbage left around by the compilation 
process which is no longer needed (but is useful during the compilation, whether 
for compilation proper or only for debugging should the compilation process 
fail) . 

REPLACE has to do with macros which displace calls to them with the 
expanded forms, but retain enough information to undo this. REPLACE undoes this 
and throws away the saved information. (The DISPLACE facility is normally turned 
off anyway, and is used only to make the compiler run faster when it itself is 
being run under the SCHEME interpreter. This was of great use when RABBIT wasn't 
running well enough to compile itself!) 

GENFLUSH removes from the MacLISP OBARRAY all the temporary generated 
symbols created by GENTEMP. 

The MAPATOMS form removes from every atom on the OBARRAY the properties 
shown. This takes more time but less space than recording exactly which atoms 
had such properties created for them. 
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(DEFINE PROCESS-DEFINE-FORM 

(LAMBDA (FORM OFILE NOISYP) 

(COND ((ATOM (CADR FORM)) 

(PROCESS-DEFINITION FORM 
OFILE 
NOISYP 
(CADR FORM) 

(IF (NULL (CDDOR FORM)) 
(CADDR FORM) 
"(LAMBDA .(CADDR FORM) 

(BLOCK . .(CDDDR FORM)))))) 
(T (PROCESS-DEFINITION FORM 

OFILE 
NOISYP 
(CAADR FORM) 
"(LAMBDA .(CDADR FORM) 

(BLOCK . ,(CDDR FORM)))))))) 

(DEFINE PROCESS-DEFINITION 

(LAMBDA (FORM OFILE NOISYP NAME LAMBDA-EXP) 

(COND ((NOT (EQ (TYPEP NAME) 'SYMBOL)) 

(WARN |Function Name Not SYMBOL | NAME FORM)) 
((OR (NOT (EQ (CAR LAMBDA-EXP) 'LAMBDA)) 
(AND (ATOM (CADR LAMBDA-EXP)) 

(NOT (NULL (CADR LAMBDA-EXP))))) 
(WARN |Ma1 formed LAMBDA-expression| LAMBDA-EXP FORM)) 
(T (PRINT (COMPILE NAME 

LAMBDA-EXP 
NIL 

♦OPTIMIZE*) 
OFILE) 
(CLEANUP) 
( IF NOISYP 

(PRINT (LIST NAME 'COMPILEO) 
(SYMEVAL 'TYO))))))) 

(DEFINE CLEANUP 

(LAMBDA () 

(BLOCK (REPLACE) 
(GENFLUSH) 
(MAPATOMS '(LAMBDA (X) 

(REMPROP X 
(REMPROP X 
(REMPROP X 
(REMPROP X 
(REMPROP X 
(REMPROP X 
(REMPROP X 



•READ-REFS) 

•WRITE-REFS) 

' NODE ) 

'BINDING) 

•USER-NAME) 

'KNOWN-FUNCTION) 

' EASY-LABELS-FUNCTION )))))) 
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SEXPRFY and CSEXPRFY are debugging aids which take a node-tree or cnode- 
tree and produce a fairly readable S-expression version of the code it 
represents. They are used by the SX and CSX macros defined earlier. The USERP 
switch for SEXPRFY specifies whether internal variables names or user variable 
names should be used in the construction. 
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002 ;;; INVERSE OF ALPHATIZE. USED BY SX, E.G., FOR DEBUGGING". 
003 

004 (DEFINE SEXPRFY 

005 (LAMBDA (NODE USERP) 

006 (LET ((FM (NODE\FORM NODE))) 
00 7 (EQCASE (TYPE FM) 

008 (CONSTANT "(QUOTE ,(CONSTANT\VALUE FM))) 

009 (VARIABLE (IF (AND USERP (NOT (VARIABLEVGLOBALP FM))) 

010 (GET (VARIABLEWAR FM) 'USER-NAME) 

011 (VARIABLEWAR FM))) 

012 (LAMBDA "(LAMBDA ,(IF USERP (LAMBDA\UVARS FM) (LAMBDAWARS FM)) 

013 .(SEXPRFY (LAMBDANBODY FM) USERP))) 

014 (IF "(IF .(SEXPRFY (IF\PRED FM) USERP) 

015 .(SEXPRFY (IF\CON FM) USERP) 

016 -.(SEXPRFY (IF\ALT FM) USERP))) 

017 (ASET "(ASET' .(IF (AND USERP (NOT (ASET\GLOBALP FM))) 

018 (GET (ASETWAR FM) 'USER-NAME) 

019 (ASETWAR FM)) 

020 .(SEXPRFY (ASET\BOOY FM) USERP))) 
°21 (CATCH "(CATCH .(IF USERP 

022 (GET (CATCHWAR FM) "USER-NAME) 

023 (CATCHWAR FM)) 

024 .(SEXPRFY (CATCH\BODY FM) USERP))) 

025 (LABELS "(LABELS .(AMAPCAR (LAMBDA (V D) "(.(IF USERP 

026 (GET V 'USER-NAME) 

027 v) 

028 .(SEXPRFY D USERP))) 

029 (LABELSXFNVARS FM) 

030 (LABELSXFNDEFS FM)) 

031 .(SEXPRFY (LABELSXBODY FM) USERP))) 

032 (COMBINATION 

033 (AMAPCAR (LAMBDA (A) (SEXPRFY A USERP)) 

034 (COMBINATIONXARGS FM) )))))) 
035 

036 (DEFINE CSEXPRFY 

03 7 (LAMBDA (CNODE) 

038 (LET ((CFM (CNODENCFORM CNODE))) 

039 (EQCASE (TYPE CFM) 

040 (TRIVIAL "(TRIVIAL .(SEXPRFY ( TRIVIALXNODE CFM) NIL))) 

04 1 (CVARIABLE (CVARIABLEWAR CFM)) 

042 (CLAMBDA "(CLAMBDA .(CLAMBDAWARS CFM) 

043 .(CSEXPRFY (CLAMBDANBODY CFM)))) 

044 (CONTINUATION 

045 "(CONTINUATION ( .(CONTINUATIONWAR CFM)) 

046 .(CSEXPRFY (CONTINUATION\BODY CFM)))) 

047 (CIF "(CIF .(CSEXPRFY (CIF\PRED CFM)) 

048 .(CSEXPRFY (CIF\CON CFM)) 

049 .(CSEXPRFY (CIF\ALT CFM)))) 

050 (CASET "(CASET' .(CSEXPRFY (CASET\CONT CFM)) 

051 .(CASETWAR CFM) 

052 .(CSEXPRFY (CASETNBODY CFM)))) 

053 (CLABELS "(CLABELS .(AMAPCAR (LAMBDA (V D) "( ,V 

054 .(CSEXPRFY D))) 

055 (CLABELSNFNVARS CFM) 

056 (CLABELSXFNDEFS CFM)) 

057 .(CSEXPRFY (CLABELSNBODY CFM)))) 

058 (COMBINATION 

059 (AMAPCAR CSEXPRFY (CCOMBINATION\ARGS CFM))) 

060 (RETURN 

061 "(RETURN .(CSEXPRFY (RETURN\CONT CFM)) 

062 .(CSEXPRFY (RETURNWAL CFM)))))))) 
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CHECK-NUMBER-OF-ARGS is used by COMPILE and ALPHA-COMBINATION to make 
sure that function calls and definitions agree on the number of arguments taken 
by a function. If a mismatch is detacted, a warning is issued. This check 
frequently catches various typographical errors. The argument DEFP is NIL unless 
this call is on behalf of a definition rather than a call. The DEFINED property 
is used only so that a more comprehensive warning may be given. 

*EXPR and *LEXPR are two special forms suitable for use in a PROCLAIM 
form for declaring that certain names refer to MacLISP functions rather than 
SCHEME functions. An example, for PRINT-SHORT, occurs on page 1 of RABBIT. 

DUMPIT establishes a simple user interface for RABBIT. After loading a 
compiled RABBIT into a SCHEME run-time system, the call (DUMPIT) initializes the 
RABBIT, then suspends the MacLISP environment, with an argument which is an ITS 
command for dumping the core image. When this core image is later loaded and 
resumed, DUMPIT prints "FILE NAME:" and reads a line. All the user need do is 
typoe a file name and a carriage return to compile the file. When this is done, 
the call to QUIT kills the RABBIT job. 

STATS prints out statistics accumulated in the counters listed in *STAT- 
VARS*. RESET-STATS resets all the counters. 
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002 (DEFINE CHECK-NUMBER-OF-ARGS 

003 (LAMBDA (NAME NARGS DEFP) 

004 (OR (GETL NAME '(*LEXPR LSUBR)) 

005 (LET ((N (GET NAME ' NUMBER-OF-ARGS) ) ) 

006 (IF N 

00 7 (IF (NOT (= N NARGS)) 

008 (IF DEFP 

009 (WARN (definition disagrees with earlier use on number of argsl 

010 NAME 

011 NARGS 

012 N ) 

013 (IF (GET NAME 'DEFINE0) 

014 (WARN |use disagrees with definition on number of args | 

015 NAME 

016 NARGS 

017 N) 

018 (WARN |two uses disagree before definition on number of aras I 

019 NAME 

020 NARGS 

021 N)))) 

°22 (PUTPROP NAME NARGS 'NUMBER-OF-ARGS)) 

023 (IF DEFP (PUTPROP NAME 'T 'DEFINED)))))) 

024 
025 

026 (DEFUN *EXPR FEXPR (X) 

027 (MAPCAR '(LAMBDA (Y) (PUTPROP Y 'T '*EXPR)) X)) 
028 

029 (DEFPROP *EXPR AFSUBR AMACRO) (DEFPROP *EXPR AMACRO AINT) 
030 

031 (DEFUN *LEXPR FEXPR (X) 

032 (MAPCAR '(LAMBOA (Y) (PUTPROP Y 'T '*LEXPR)) X)) 
033 

034 (DEFPROP *LEXPR AFSUBR AMACRO) (DEFPROP *LEXPR AMACRO AINT) 

035 

036 

037 (DEFINE DUMPIT 

038 (LAMBDA ( ) 

039 (BLOCK (INIT-RABBIT) 

040 (SUSPEND ' | :PDUMP DSK: SCHEME ;TS RABBIT |) 

041 (TERPRI) 

042 (PRINC '|Fi1e name: |) 

043 (COMFILE (READLINE)) 

044 (QUIT)))) 
045 

046 (DEFINE STATS 

04 7 (LAMBDA ( ) 

048 (AMAPC (LAMBDA (VAR) 

04 9 (BLOCK (TERPRI) 
°50 (PRIN1 VAR) 

051 (PRINC '|=|) 

052 (PRIN1 (SYMEVAL VAR)))) 

053 *STAT-VARS*))) 
054 

055 (DEFINE RESET-STATS 

056 (LAMBDA () (AMAPC (LAMBOA (VAR) (SET VAR 0)) *STAT-VARS* ) )) 
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* SIDE EFFECTS 017 005 

* EXPR FEXPR 071 026 

* EXPR PROPERTY 071 027 

* EXPR AMACRO 071 029 

* EXPR AINT 071 029 

* LEXPR FEXPR , 071 031 

* LEXPR PROPERTY 071 032 

* LEXPR AMACRO 071 034 

* LEXPR AINT 071 034 

+ SIDE EFFECTS . 017 003 

' SIDE EFFECTS 017 004 

/ SIDE EFFECTS 017 006 

< SIDE EFFECTS 017 008 

= SIDE EFFECTS 017 00 7 

> SIDE EFFECTS 017 009 

ACCESSFN PDEFINE 004 002 

ACCESSFN MACLISP MACRO 004 004 

ADDPR0P SCHEME FUNCTION 006 004 

ADJ0IN SCHEME FUNCTION 006 029 

AOJUST-KNOWNFN-CENV SCHEME FUNCTION 052 002 

AINT PROPERTY 071 029 

AINT PROPERTY 071 034 

ALPHA-ASET SCHEME FUNCTION 010 010 

ALPHA-ATOM SCHEME FUNCTION 009 032 

ALPHA-BLOCK SCHEME FUNCTION 011 011 

ALPHA-CATCH SCHEME FUNCTION '. . . 010 029 

ALPHA-COMBINATION SCHEME FUNCTION 011 037 

ALPHA- IF SCHEME FUNCTION 010 002 

ALPHA-LABELS SCHEME FUNCTION 010 040 

ALPHA-LABELS-DEFN SCHEME FUNCTION 011 002 

ALPHA-LAMBDA SCHEME FUNCTION 009 042 

ALPHATIZE SCHEME FUNCTION 009 005 

AMACR0 PROPERTY 071 029 

AMACRO PROPERTY 071 034 

ANALYZE SCHEME FUNCTION 058 002 

ANALYZE-CCOMBINATION SCHEME FUNCTION 060 003 

ANALYZE-CIF SCHEME FUNCTION 059 002 

ANALYZE-CLABELS SCHEME FUNCTION 059 025 

ANALYZE-CLAMBDA SCHEME. FUNCTION 058 023 

ANALYZE-CONTINUATION SCHEME FUNCTION 058 040 

ANALYZE-RETURN SCHEME FUNCTION 060 033 

APPEND SIDE EFFECTS 017 057 

ASET DATA TYPE 008 042 

ASK PDP-10 SCHEME MACRO 003 024 

ASS< ? SIDE EFFECTS 017 



ATOM 



059 

SIDE EFFECTS 017 045 

052 
030 



BIGP SIDE EFFECTS 017 

BIND-ANALYZE SCHEME FUNCTION 034 

BIND-ANALYZE-CASET SCHEME FUNCTION 035 040 

BIND-ANALYZE-CCOMBINATION SCHEME FUNCTION " 037 002 

BIND-ANALYZE-CIF SCHEME FUNCTION 035 030 

BIND-ANALYZE-CLABELS SCHEME FUNCTION 036 002 

BIND-ANALYZE-CLAMBDA SCHEME FUNCTION 035 002 

BIND-ANALYZE-CONTINUATION SCHEME FUNCTION .... 035 016 

BIND-ANALYZE-RETURN SCHEME FUNCTION 



BIND-CCOMBINATION-ANALYZE SCHEME FUNCTION 037 

CAAAAR SIDE EFFECTS 017 

CAAADR SIDE EFFECTS 017 

CAAAR SIDE EFFECTS 017 

CAADAR SIDE EFFECTS 017 „ fcW 

CAADDR SIDE EFFECTS 017 027 

£ AA [! R SIDE EFFECTS 017 017 

CAAR SIDE EFFECTS 017 012 

028 



036 033 
043 
024 
025 
016 
026 



CADAAR SIDE EFFECTS 017 „..„ 

CADADR SIDE EFFECTS 017 029 

CADAR SIDE EFFECTS 017 018 
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CADDAR SIDE EFFECTS 017 030 

CADDDR SIDE EFFECTS 017 031 

CAOOR SIDE EFFECTS 017 019 

CADR SIDE EFFECTS 017 013 

CAR SIDE EFFECTS 017 010 

CASET DATA TYPE 026 039 

CATCH DATA TYPE 008 046 

CATENATE MACLISP MACRO 002 016 

CCOMBINATION DATA TYPE 026 047 

CDAAAR SIDE EFFECTS 017 032 

CDAADR SIDE EFFECTS 017 033 

CDAAR SIDE EFFECTS 017 020 

CDADAR SIDE EFFECTS 017 034 

CDADDR SIDE EFFECTS 017 035 

CDADR SIDE EFFECTS 017 021 

CDAR SIDE EFFECTS 017 014 

CDDAAR SIDE EFFECTS 017 036 

CDDADR SIDE EFFECTS 017 037 

CDDAR SIDE EFFECTS . . 017 022 

CODDAR SIDE EFFECTS 017 038 

CDDDDR SIDE EFFECTS 017 039 

CDDDR SIDE EFFECTS 017 023 

CDDR SIDE EFFECTS 017 015 

COR SIDE EFFECTS 017 Oil 

CENV-ANALYZE SCHEME FUNCTION 032 018 

CENV-CCOMBINATION-ANALYZE SCHEME FUNCTION 033 028 

CENV-TRIV-ANALYZE SCHEME FUNCTION 033 004 

CHECK-COMBINATION-PEFFS SCHEME FUNCTION 016 002 

CHECK-NUMBER-OF-ARGS SCHEME FUNCTION 071 002 

CIF DATA TYPE 026 038 

CL PDP-10 SCHEME MACRO 007 049 

CLABELS DATA TYPE 026 040 

CLAMBDA DATA TYPE 026 017 

CLEANUP SCHEME FUNCTION 069 038 

CLOBBER MACLISP MACRO 004 025 

CLOSE-ANALYZE SCHEME FUNCTION 040 002 

CNAME MACLISP MACRO 004 015 

CNODE DATA TYPE 026 007 

CNODIFY SCHEME FUNCTION 027 002 

COMBINATION DATA TYPE 008 055 

COMF ILE SCHEME FUNCTION 066 002 

COMP-BODY SCHEME FUNCTION 045 006 

COMPILATE SCHEME FUNCTION 041 018 

COMPILATE-LOOP SCHEME FUNCTION 064 017 

COMPILATE-ONE-FUNCTION SCHEME FUNCTION 064 002 

COMPILE SCHEME FUNCTION 007 010 

COMPONENT-NAMES PROPERTY . . . 005 060 

CONDICATE SCHEME FUNCTION 062 004 

CONS SIDE EFFECTS 017 055 

CONS-CLOSEREFS SCHEME FUNCTION 061 011 

CONSTANT DATA TYPE 008 028 

CONTINUATION DATA TYPE 026 036 

CONVERT SCHEME FUNCTION 027 006 

CONVERT-ASET SCHEME FUNCTION 029 002 

CONVERT-CATCH SCHEME FUNCTION 029 024 

CONVERT-COMBINATION SCHEME FUNCTION 031 014 

CONVERT- IF SCHEME FUNCTION 028 024 

CONVERT-LABELS SCHEME FUNCTION 030 006 

CONVERT-LAMBDA-FM SCHEME FUNCTION 028 009 

COPY-COOE SCHEME FUNCTION 025 002 

COPY-NODES SCHEME FUNCTION 025 007 

CSEXPRFY SCHEME FUNCTION 070 036 

CSX MACL ISP MACRO 003 029 

CVARIABLE DATA TYPE 026 015 

CXR SIDE EFFECTS 017 040 

DECARCORATE SCHEME FUNCTION 062 015 

DEFINE ^DEFINE 001 062 
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DEFMAC PDEFINE 001 063 

DEFTYPE PDEFINE 005 002 

DEFTYPE MACLISP MACRO 005 008 

DELPR0P SCHEME FUNCTION 006 012 

DEPROGNIFY MACLISP MACRO 042 005 

DEPROGNIFY1 SCHEME FUNCTION 042 009 

DEPTH-ANALYZE SCHEME FUNCTION 038 010 

DISPLACE EXPR 001 006 

DUMPIT SCHEME FUNCTION 071 037 

EFFDEF PDEFINE 016 039 

EFFDEF MACLISP MACRO 016 034 

EFFECTLESS SCHEME FUNCTION 023 036 

EFFECTLESS-EXCEPT-CONS SCHEME FUNCTION 023 039 

EFFS-ANALYZE SCHEME FUNCTION 014 006 

EFFS-ANALYZE-COMBINATION SCHEME FUNCTION 015 031 

EFFS-ANALYZE- IF SCHEME FUNCTION 015 010 

EFFS- INTERSECT SCHEME FUNCTION 023 028 

EFFS-UNION SCHEME FUNCTION 015 002 

EMPTY SCHEME FUNCTION 002 005 

ENV-ANALYZE SCHEME FUNCTION 012 018 

ENVCARCDR SCHEME FUNCTION 043 002 

E ^ A : C SIDE EFFECTS 017 044 

EQCASE MACL ISP MACRO 003 032 

ERASE-ALL-NODES MACLISP MACRO 018 005 

ERASE-NODE MACLISP MACRO . '. 018 004 

ERASE-NODES SCHEME FUNCTION 018 007 

F ILTER-CLOSEREFS SCHEME FUNCTION 039 030 

FIXP SIDE EFFECTS 017 050 

FL0ATP SIDE EFFECTS 017 051 

FN FN-SIOE-EFFECTS 016 035 



FN 
FN 



FN-SIDE-AFFECTEO 016 036 

OKAY-TO-FOLD 016 037 



FN-SIDE-AFFECTED PROPERTY 016 036 

FN-SIDE-EFFECTS PROPERTY 016 035 

GEN-GLOBAL-NAME SCHEME FUNCTION 002 041 

GENFLUSH SCHEME FUNCTION 002 036 

GENTEMP SCHEME FUNCTION 002 030 

HUNKFN &DEF INE 004 028 

HUNKFN MACLISP MACRO 004 030 

HUNKP SIDE EFFECTS 017 049 

J F DATA TYPE 008 038 

INCREMENT MACLISP MACRO 002 014 

INTERSECT SCHEME FUNCTION 006 039 

LABELS DATA TYPE 008 050 

LAMB0A DATA TYPE 008 034 

LAMBDACATE SCHEME FUNCTION 054 006 

|" IST : -SIDE EFFECTS 017 056 



h?™ PICATE SCHEME FUNCTION 061 002 

065 
026 



MACR0 PDEF INE 001 065 



MACRO-EXPAND SCHEME FUNCTION 011 _ 

MAKE-RETURN SCHEME FUNCTION 028 002 

MAP-USER-NAMES SCHEME FUNCTION 065 028 

MAPANALYZE SCHEME FUNCTION 057 023 

MEMQ SIDE EFFECTS 017 058 

META-COMBINATION-LAMBDA SCHEME FUNCTION 021 007 

META-COMBINATION-TRIVFN SCHEME FUNCTION [ 0*0 037 

META-EVALUATE SCHEME FUNCTION 019 007 

META-IF-FUDGE SCHEME FUNCTION 020 010 

META-SUBSTITUTE SCHEME FUNCTION 024 009 

U A " E MACLISP MACRO 004 011 

„ AM i A CCESS MACRO 004 031 

™*"l COMPONENT-NAMES 005 060 

* AME SUPPRESSED-COMPONENT-NAMES 005 061 

™°* ' DATA TYPE 008 012 

I°; IFY SCHEME FUNCTION 008 059 

"°, SIDE EFFECTS 017 053 

NULL SIDE EFFECTS 017 054 
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NUMBERP SIDE EFFECTS 017 046 

OKAY-TO-FOLD PROPERTY 016 037 

OUTPUT-ASET SCHEME FUNCTION 061 017 

PAIRLIS SCHEME FUNCTION 007 002 

PASSl-ANAi. i ZE SCHEME FUNCTION 007 041 

PASSABLE SCHEME FUNCTION 023 042 

PRIN1 SIDE EFFECTS 017 061 

PRINC SIDE EFFECTS 017 062 

PRINT SIDE EFFECTS 017 060 

PRINT-SHORT EXPR 003 016 

PRINT-WARNING SCHEME FUNCTION 003 005 

PROCESS-DEF INE-FORM SCHEME FUNCTION 069 002 

PROCESS-DEF INITION SCHEME FUNCTION 069 020 

PROCESS-FORM SCHEME FUNCTION 068 006 

PRODUCE-ASET SCHEME FUNCTION 046 003 

PRODUCE-COMBINATION SCHEME FUNCTION 051 002 

PRODUCE-COMBINATION-VARIABLE SCHEME FUNCTION 051 026 

PROOUCE-CONTINUATION-RETURN SCHEME FUNCTION 052 023 

PRODUCE- IF SCHEME FUNCTION 045 042 

PRODUCE-LABELS SCHEME FUNCTION 047 003 

PRODUCE-LAMBDA-COMBINATION SCHEME FUNCTION 048 003 

PRODUCE-RETURN SCHEME FUNCTION 053 002 

PRODUCE-RETURN-1 SCHEME FUNCTION 053 021 

PRODUCE-TRIVFN-COMBINATION SCHEME FUNCTION 049 002 

PRODUCE-TRIVFN-COMBINATION-CONTINUATION SCHEME FUNCTION 049 023 

PRODUCE-TRIVFN-COMBINATION-CVARIABLE .. SCHEME FUNCTION 050 002 

PSETQ-ARGS SCHEME FUNCTION 057 002 

PSETQ-ARGS-ENV SCHEME FUNCTION 057 006 

PSETQ-TEMPS SCHEME FUNCTION 057 013 

PSETQIFY SCHEME FUNCTION 055 005 

PSETQIFY-METHOD-2 SCHEME FUNCTION 055 012 

PSETQIFY-METHOD-3 SCHEME FUNCTION 056 002 

READ SIDE EFFECTS 017 065 

READIFY SCHEME FUNCTION 067 042 

REANALYZE1 SCHEME FUNCTION 023 002 

REFD-VARS SCHEME FUNCTION 034 053 

REGSLIST SCHEME FUNCTION 044 004 

REMARK-ON SCHEME FUNCTION 065 002 

REMOVE SCHEME FUNCTION 006 047 

RESET-STATS SCHEME FUNCTION 071 055 

RETURN DATA TYPE 026 049 

RPLACA SIDE EFFECTS 017 041 

RPLACD SIDE EFFECTS 017 042 

RPLACX SIDE EFFECTS 017 043 

SCHMAC PDEF INE 001 064 

SET-UP-ASETVARS SCHEME FUNCTION 044 032 

SETDIFF SCHEME FUNCTION 006 058 

SETPROP SCHEME FUNCTION 006 018 

SEXPRFY SCHEME FUNCTION 070 004 

STATS SCHEME FUNCTION 071 046 

SUBST-CANDIDATE SCHEME FUNCTION 022 006 

SUPPRESSED-COMPONENT-NAMES PROPERTY 005 061 

SX MACLISP MACRO 003 028 

SYMBOLP SIDE EFFECTS 017 048 

TEMPLOC SCHEME FUNCTION 042 029 

TERPRI SIDE EFFECTS 017 063 

TEST-COMPILE SCHEME FUNCTION 007 051 

TRANSDUCE SCHEME FUNCTION 067 002 

TRIV-ANALYZE SCHEME FUNCTION 013 012 

TRIV-ANALYZE-FN-P SCHEME FUNCTION 013 056 

TRIV-LAMBDACATE SCHEME FUNCTION 063 033 

TRIVFN SCHEME FUNCTION 002 009 

TRIVIAL DATA TYPE 026 013 

TRIVIALIZE SCHEME FUNCTION 063 002 

TYI SIDE EFFECTS 017 066 

TYO SIDE EFFECTS 017 064 

TYPE HUNK ACCESS MACRO 005 006 
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TYPEP SIDE EFFECTS 017 047 

UNI0N SCHEME FUNCTION 006 033 

USED-TEMPLOCS SCHEME FUNCTION 064 059 

VARIABLE DATA jype 08 030 

WARN MACL ISP MACRO 003 002 



